0% found this document useful (0 votes)
288 views568 pages

UserGuide QualityStage

UserGuide QualityStage

Uploaded by

Erick Lugo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
288 views568 pages

UserGuide QualityStage

UserGuide QualityStage

Uploaded by

Erick Lugo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 568

TM

QualityStage

Designer User Guide

Version 7.0
August 2003
Important Notice
This document, and the software described or referenced in it, are confidential and proprietary to Ascential Software
Corporation ("Ascential"). They are provided under, and are subject to, the terms and conditions of a license agreement between
Ascential and the licensee, and may not be transferred, disclosed, or otherwise provided to third parties, unless otherwise
permitted by that agreement. No portion of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of
Ascential. The specifications and other information contained in this document for some purposes may not be complete, current,
or correct, and are subject to change without notice. NO REPRESENTATION OR OTHER AFFIRMATION OF FACT
CONTAINED IN THIS DOCUMENT, INCLUDING WITHOUT LIMITATION STATEMENTS REGARDING CAPACITY,
PERFORMANCE, OR SUITABILITY FOR USE OF PRODUCTS OR SOFTWARE DESCRIBED HEREIN, SHALL BE
DEEMED TO BE A WARRANTY BY ASCENTIAL FOR ANY PURPOSE OR GIVE RISE TO ANY LIABILITY OF ASCENTIAL
WHATSOEVER. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL ASCENTIAL BE LIABLE FOR
ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER
RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
SOFTWARE. If you are acquiring this software on behalf of the U.S. government, the Government shall have only "Restricted
Rights" in the software and related documentation as defined in the Federal Acquisition Regulations (FARs) in Clause 52.227.19
(c) (2). If you are acquiring the software on behalf of the Department of Defense, the software shall be classified as "Commercial
Computer Software" and the Government shall have only "Restricted Rights" as defined in Clause 252.227-7013 (c) (1) of DFARs.
© 2003, 1999-2002 Ascential Software Corporation. All rights reserved.
QualityStage, QualityStage Designer, QualityStage Real Time, QualityStage SERP, QualityStage DPID Interface Solution for
ATLAS, QualityStage GeoLocator, QualityStage WAVES, QualityStage CASS, and QualityStage Z4Change are trademarks of
Ascential Software Corporation or its affiliates and may be registered in the United States or other jurisdictions.
Adobe and Acrobat are trademarks of Adobe Systems Incorporated.
Data Warehouse Center is a trademark; ISPF/PDF MVS, TSO, IBM, and MVS are registered trademarks of International
Business Machines Corporation.
UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Ltd.
Windows and Windows NT are trademarks of Microsoft Corporation.
Winsock REXECD/NT is a copyright of Denicomp Systems.
Other marks are the property of the owners of those marks.
Published by Ascential Software.
This Product may contain or utilize third-party components subject to the following (as applicable);

This product includes Hypersonic SQL.


ORIGINAL LICENSE (a.k.a. "hypersonic_lic.txt")
For content, code, and products originally developed by Thomas Mueller and the Hypersonic SQL Group:

Copyright (c) 1995-2000 by the Hypersonic SQL Group. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following
conditions are met: 1) Redistributions of source code must retain the above copyright notice, this list of conditions and the
following disclaimer. 2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions
and the following disclaimer in the documentation and/or other materials provided with the distribution. 3) All advertising
materials mentioning features or use of this software must display the following acknowledgment: "This product includes
Hypersonic SQL." 4) Products derived from this software may not be called "Hypersonic SQL" nor may "Hypersonic SQL"
appear in their names without prior written permission of the Hypersonic SQL Group. 5) Redistributions of any form
whatsoever must retain the following acknowledgment: "This product includes Hypersonic SQL." This software is provided
"as is" and any expressed or implied warranties, including, but not limited to, the implied warranties of merchantability
and fitness for a particular purpose are disclaimed. In no event shall the Hypersonic SQL Group or its contributors be
liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to,
procurement of substitute goods or services; loss of use, data, or profits; or business interruption). However caused any on
any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out
of the use of this software, even if advised of the possibility of such damage. This software consists of voluntary
contributions made by many individuals on behalf of the Hypersonic SQL Group.
Copyright © 2002 Sun Microsystems, Inc. All rights reserved. Redistribution and use in source and binary forms, with or
without modification, are permitted provided that the following conditions are met: 1. Redistribution of source code must
retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistribution in binary form
must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution. Neither the name of Sun Microsystems, Inc. or the names of contributors
may be used to endorse or promote products derived from this software without specific prior written permission. You
acknowledge that this software is not designed, licensed or intended for use in the design, construction, operation or
maintenance of any nuclear facility.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/). Copyright ©
1999-2000 The Apache Software Foundation. All rights reserved. Redistribution and use in source and binary forms, with
or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code
must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary
form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution. 3. The end-user documentation included with the redistribution, if
any, must include the following acknowledgment: "This product includes software developed by the Apache Software
Foundation (http://www.apache.org/)."Alternately, this acknowledgment may appear in the software itself, if and wherever
such third-party acknowledgments normally appear. 4. The names "Xerces" and "Apache Software Foundation" must not
be used to endorse or promote products derived from this software without prior written permission. For written
permission, please contact apache@apache.org. 5. Products derived from this software may not be called "Apache", nor may
"Apache" appear in their name, without prior written permission of the Apache Software Foundation.
TCL/TK License Terms. This software is copyrighted by the Regents of the University of California, Sun Microsystems,
Inc., Scriptics Corporation, and other parties. The following terms apply to all files associated with the software unless
explicitly disclaimed in individual files. The authors hereby grant permission to use, copy, modify, distribute, and license
this software and its documentation for any purpose, provided that existing copyright notices are retained in all copies and
that this notice is included verbatim in any distributions. No written agreement, license, or royalty fee is required for any
of the authorized uses. Modifications to this software may be copyrighted by their authors and need not follow the licensing
terms described here, provided that the new terms are clearly indicated on the first page of each file where they apply. IN
NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,
SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE, ITS
DOCUMENTATION, OR ANY DERIVATIVES THEREOF, EVEN IF THE AUTHORS HAVE BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE. THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. THIS SOFTWARE IS PROVIDED ON AN "AS
IS" BASIS, AND THE AUTHORS AND DISTRIBUTORS HAVE NO OBLIGATION TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. GOVERNMENT USE: If you are acquiring this
software on behalf of the U.S. government, the Government shall have only "Restricted Rights" in the software and related
documentation as defined in the Federal Acquisition Regulations (FARs) in Clause 52.227.19 (c) (2). If you are acquiring
the software on behalf of the Department of Defense, the software shall be classified as "Commercial Computer Software"
and the Government shall have only "Restricted Rights" as defined in Clause 252.227-7013 (c) (1) of DFARs.
Notwithstanding the foregoing, the authors grant the U.S. Government and others acting in its behalf permission to use
and distribute the software in accordance with the terms specified in this license.
Copyright © 1997-1998 DUNDAS SOFTWARE LTD., all rights reserved.
Copyright © 2001 Ironring Software (http://www.ironringsoftware.com).
Copyright © 1987 Regents of the University of California. All rights reserved.
Copyright © 1996, 2000, 2001, Nara Institute of Science and Technology. All rights reserved. Redistribution and use in
source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1.
Redistribution of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2.
Redistribution in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the distribution. 3. All advertising materials mentioning
features or use of this software must display the following acknowledgments: This product includes software developed by
Nara Institute of Science and technology. 4. The name Nara Institute of Science and Technology my not be used to endorse
or promote products derived from this software without specific prior written permission.
ANTLR 1989-2000 Developed by jGuru.com (MageLang Institute), http://www.ANTLR.org and http://www.jGuru.com.
LAPACK Users’ Guide, 3rd Edition, Society for Industrial and Applied Mathematics.
Copyright 1990, by Alfalfa Software Incorporated, Cambridge, Massachusetts. All rights reserved. Permission to use,
copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted,
provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice
appear in supporting documentation, and that Alfalfa’s name not be used in advertising pr publicity pertaining to
distribution of the software without specific, written permission.
ICU License - ICU 1.8.1 and later
COPYRIGHT AND PERMISSION NOTICE
Copyright (c) 1995-2002 International Business Machines Corporation and others
All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation
files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy,
modify, merge, publish, distribute, and/or sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, provided that the above copyright notice(s) and this permission notice appear in all copies of the
Software and that both the above copyright notice(s) and this permission notice appear in supporting documentation.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT
OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR
PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote
the sale, use or other dealings in this Software without prior written authorization of the copyright holder.
All trademarks and registered trademarks mentioned herein are the property of their respective owners.
Jetty License Revision: 3.7
Preamble:
The intent of this document is to state the conditions under which the Jetty Package may be copied, such that the
Copyright Holder maintains some semblance of control over the development of the package, while giving the users of the
package the right to use, distribute and make reasonable modifications to the Package in accordance with the goals and
ideals of the Open Source concept as described at http://www.opensource.org.
It is the intent of this license to allow commercial usage of the Jetty package, so long as the source code is distributed or
suitable visible credit given or other arrangements made with the copyright holders. Additional information available at
http://jetty.mortbay.org
Definitions:
"Jetty" refers to the collection of Java classes that are distributed as a HTTP server with servlet capabilities and associated
utilities.
"Package" refers to the collection of files distributed by the Copyright Holder, and derivatives of that collection of files
created through textual modification.
"Standard Version" refers to such a Package if it has not been modified, or has been modified in accordance with the wishes
of the Copyright Holder.
"Copyright Holder" is whoever is named in the copyright or copyrights for the package.
Mort Bay Consulting Pty. Ltd. (Australia) is the "Copyright Holder" for the Jetty package.
"You" is you, if you're thinking about copying or distributing this Package.
"Reasonable copying fee" is whatever you can justify on the basis of media cost, duplication charges, time of people
involved, and so on. (You will not be required to justify it to the Copyright Holder, but only to the computing community at
large as a market that must bear the fee.)
"Freely Available" means that no fee is charged for the item itself, though there may be fees involved in handling the item.
It also means that recipients of the item may redistribute it under the same conditions they received it.
0. The Jetty Package is Copyright (c) Mort Bay Consulting Pty. Ltd. (Australia) and others. Individual files in this package
may contain additional copyright notices. The javax.servlet packages are copyright Sun Microsystems Inc.
1. The Standard Version of the Jetty package is available from http://jetty.mortbay.org.
2. You may make and distribute verbatim copies of the source form of the Standard Version of this Package without
restriction, provided that you include this license and all of the original copyright notices and associated disclaimers.
3. You may make and distribute verbatim copies of the compiled form of the Standard Version of this Package without
restriction, provided that you include this license.
4. You may apply bug fixes, portability fixes and other modifications derived from the Public Domain or from the Copyright
Holder. A Package modified in such a way shall still be considered the Standard Version.
5. You may otherwise modify your copy of this Package in any way, provided that you insert a prominent notice in each
changed file stating how and when you changed that file, and provided that you do at least ONE of the following:
a) Place your modifications in the Public Domain or otherwise make them Freely Available, such as by posting said
modifications to Usenet or an equivalent medium, or placing the modifications on a major archive site such as ftp.uu.net, or
by allowing the Copyright Holder to include your modifications in the Standard Version of the Package.
b) Use the modified Package only within your corporation or organization.
c) Rename any non-standard classes so the names do not conflict with standard classes, which must also be provided, and
provide a separate manual page for each non-standard class that clearly documents how it differs from the Standard
Version.
d) Make other arrangements with the Copyright Holder.
6. You may distribute modifications or subsets of this Package in source code or compiled form, provided that you do at
least ONE of the following:
a) Distribute this license and all original copyright messages, together with instructions (in the about dialog, manual page
or equivalent) on where to get the complete Standard Version.
b) Accompany the distribution with the machine-readable source of the Package with your modifications. The modified
package must include this license and all of the original copyright notices and associated disclaimers, together with
instructions on where to get the complete Standard Version.
c) Make other arrangements with the Copyright Holder.
7. You may charge a reasonable copying fee for any distribution of this Package. You may charge any fee you choose for
support of this Package. You may not charge a fee for this Package itself. However, you may distribute this Package in
aggregate with other (possibly commercial) programs as part of a larger (possibly commercial) software distribution
provided that you meet the other distribution requirements of this license.
8. Input to or the output produced from the programs of this Package do not automatically fall under the copyright of this
Package, but belong to whomever generated them, and may be sold commercially, and may be aggregated with this
Package.
9. Any program subroutines supplied by you and linked into this Package shall not be considered part of this Package.
10. The name of the Copyright Holder may not be used to endorse or promote products derived from this software without
specific prior written permission.
11. This license may change with each release of a Standard Version of the Package. You may choose to use the license
associated with version you are using or the license of the latest Standard Version.
12. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE.
13. If any superior law implies a warranty, the sole remedy under such shall be , at the Copyright Holders option either a)
return of any price paid or b) use or reasonable endeavours to repair or replace the software.
14. This license shall be read under the laws of Australia.
The End
This license was derived from the Artistic license published on http://www.opensource.com
The Apache Software License, Version 1.1
This product includes software developed by the Apache Software Foundation (http://www.apache.org/).
Copyright (c) 2000 The Apache Software Foundation. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following
conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided with the distribution.
3. The end-user documentation included with the redistribution, if any, must include the following acknowledgment: "This
product includes software developed by the Apache Software Foundation (http://www.apache.org/)." Alternately, this
acknowledgment may appear in the software itself, if and wherever such third-party acknowledgments normally appear.
4. The names "Apache" and "Apache Software Foundation" must not be used to endorse or promote products derived from
this software without prior written permission. For written permission, please contact apache@apache.org.
5. Products derived from this software may not be called "Apache", nor may "Apache" appear in their name, without prior
written permission of the Apache Software Foundation.
THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR ITS
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
This software consists of voluntary contributions made by many individuals on behalf of the Apache Software Foundation.
For more information on the Apache Software Foundation, please see <http://www.apache.org/>.
Portions of this software are based upon public domain software originally written at the National Center for
Supercomputing Applications, University of Illinois, Urbana-Champaign.

Common Public License Version 0.5


THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS COMMON PUBLIC LICENSE
("AGREEMENT"). ANY USE, REPRODUCTION OR DISTRIBUTION OF THE PROGRAM CONSTITUTES
RECIPIENT'S ACCEPTANCE OF THIS AGREEMENT.
1. DEFINITIONS
"Contribution" means:
a) in the case of the initial Contributor, the initial code and documentation distributed under this Agreement, and
b) in the case of each subsequent Contributor:
i) changes to the Program, and
ii) additions to the Program;
where such changes and/or additions to the Program originate from and are distributed by that particular Contributor. A
Contribution 'originates' from a Contributor if it was added to the Program by such Contributor itself or anyone acting on
such Contributor's behalf. Contributions do not include additions to the Program which: (i) are separate modules of
software distributed in conjunction with the Program under their own license agreement, and (ii) are not derivative works
of the Program.
"Contributor" means any person or entity that distributes the Program.
"Licensed Patents " mean patent claims licensable by a Contributor which are necessarily infringed by the use or sale of its
Contribution alone or when combined with the Program.
"Program" means the Contributions distributed in accordance with this Agreement.
"Recipient" means anyone who receives the Program under this Agreement, including all Contributors.
2. GRANT OF RIGHTS
a) Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non-exclusive, worldwide,
royalty-free copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, distribute and
sublicense the Contribution of such Contributor, if any, and such derivative works, in source code and object code form.
b) Subject to the terms of this Agreement, each Contributor hereby grants Recipient a non-exclusive, worldwide,
royalty-free patent license under Licensed Patents to make, use, sell, offer to sell, import and otherwise transfer the
Contribution of such Contributor, if any, in source code and object code form. This patent license shall apply to the
combination of the Contribution and the Program if, at the time the Contribution is added by the Contributor, such
addition of the Contribution causes such combination to be covered by the Licensed Patents. The patent license shall not
apply to any other combinations which include the Contribution. No hardware per se is licensed hereunder.
c) Recipient understands that although each Contributor grants the licenses to its Contributions set forth herein, no
assurances are provided by any Contributor that the Program does not infringe the patent or other intellectual property
rights of any other entity. Each Contributor disclaims any liability to Recipient for claims brought by any other entity
based on infringement of intellectual property rights or otherwise. As a condition to exercising the rights and licenses
granted hereunder, each Recipient hereby assumes sole responsibility to secure any other intellectual property rights
needed, if any. For example, if a third party patent license is required to allow Recipient to distribute the Program, it is
Recipient's responsibility to acquire that license before distributing the Program.
d) Each Contributor represents that to its knowledge it has sufficient copyright rights in its Contribution, if any, to grant
the copyright license set forth in this Agreement.

3. REQUIREMENTS
A Contributor may choose to distribute the Program in object code form under its own license agreement, provided that:
a) it complies with the terms and conditions of this Agreement; and
b) its license agreement:
i) effectively disclaims on behalf of all Contributors all warranties and conditions, express and implied, including
warranties or conditions of title and non-infringement, and implied warranties or conditions of merchantability and fitness
for a particular purpose;
ii) effectively excludes on behalf of all Contributors all liability for damages, including direct, indirect, special, incidental
and consequential damages, such as lost profits;
iii) states that any provisions which differ from this Agreement are offered by that Contributor alone and not by any other
party; and
iv) states that source code for the Program is available from such Contributor, and informs licensees how to obtain it in a
reasonable manner on or through a medium customarily used for software exchange.
When the Program is made available in source code form:
a) it must be made available under this Agreement; and
b) a copy of this Agreement must be included with each copy of the Program.
Contributors may not remove or alter any copyright notices contained within the Program.
Each Contributor must identify itself as the originator of its Contribution, if any, in a manner that reasonably allows
subsequent Recipients to identify the originator of the Contribution.
4. COMMERCIAL DISTRIBUTION
Commercial distributors of software may accept certain responsibilities with respect to end users, business partners and
the like. While this license is intended to facilitate the commercial use of the Program, the Contributor who includes the
Program in a commercial product offering should do so in a manner which does not create potential liability for other
Contributors. Therefore, if a Contributor includes the Program in a commercial product offering, such Contributor
("Commercial Contributor") hereby agrees to defend and indemnify every other Contributor ("Indemnified Contributor")
against any losses, damages and costs (collectively "Losses") arising from claims, lawsuits and other legal actions brought
by a third party against the Indemnified Contributor to the extent caused by the acts or omissions of such Commercial
Contributor in connection with its distribution of the Program in a commercial product offering. The obligations in this
section do not apply to any claims or Losses relating to any actual or alleged intellectual property infringement. In order to
qualify, an Indemnified Contributor must: a) promptly notify the Commercial Contributor in writing of such claim, and b)
allow the Commercial Contributor to control, and cooperate with the Commercial Contributor in, the defense and any
related settlement negotiations. The Indemnified Contributor may participate in any such claim at its own expense.
For example, a Contributor might include the Program in a commercial product offering, Product X. That Contributor is
then a Commercial Contributor. If that Commercial Contributor then makes performance claims, or offers warranties
related to Product X, those performance claims and warranties are such Commercial Contributor's responsibility alone.
Under this section, the Commercial Contributor would have to defend claims against the other Contributors related to
those performance claims and warranties, and if a court requires any other Contributor to pay any damages as a result, the
Commercial Contributor must pay those damages.

5. NO WARRANTY
EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS PROVIDED ON AN "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING,
WITHOUT LIMITATION, ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely responsible for
determining the appropriateness of using and distributing the Program and assumes all risks associated with its exercise
of rights under this Agreement, including but not limited to the risks and costs of program errors, compliance with
applicable laws, damage to or loss of data, programs or equipment, and unavailability or interruption of operations.
6. DISCLAIMER OF LIABILITY
EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT NOR ANY CONTRIBUTORS
SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR DISTRIBUTION OF THE PROGRAM
OR THE EXERCISE OF ANY RIGHTS GRANTED HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
7. GENERAL
If any provision of this Agreement is invalid or unenforceable under applicable law, it shall not affect the validity or
enforceability of the remainder of the terms of this Agreement, and without further action by the parties hereto, such
provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable.
If Recipient institutes patent litigation against a Contributor with respect to a patent applicable to software (including a
cross-claim or counterclaim in a lawsuit), then any patent licenses granted by that Contributor to such Recipient under this
Agreement shall terminate as of the date such litigation is filed. In addition, If Recipient institutes patent litigation against
any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Program itself (excluding combinations of
the Program with other software or hardware) infringes such Recipient's patent(s), then such Recipient's rights granted
under Section 2(b) shall terminate as of the date such litigation is filed.
Table of Contents

Table of Contents
Preface
About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Prerequisites for Using QualityStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxviii
Related Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxviii
Documentation Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx
QualityStage Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx
Additional Information and Assistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxii

Chapter 1
Welcome
Using Re-engineered Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-1
Introducing QualityStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-2
QualityStage and QualityStage Real Time. . . . . . . . . . . . . . . . . . . . . . . .1-2
Product Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-3
Supports Data Quality Management Standards . . . . . . . . . . . . . . . . . . .1-3
Feature Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-4
Benefits Highlights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-4
How QualityStage Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-5
Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-5
Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-5
Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-6

QualityStage Designer User Guide ix


TABLE OF CONTENTS

Survivorship and Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1-7

Chapter 2
The Workflow for Creating Re-engineered Data
What is Re-engineered Data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-1
Overview of the Re-engineering Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-2
Overview of Phase One. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3
Overview of Phase Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-3
Overview of Phase Three . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4
Overview of Phase Four . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-4
Phase One: Understand the Business Goals . . . . . . . . . . . . . . . . . . . . . . . . . .2-4
How High Quality Data Meets Business Goals . . . . . . . . . . . . . . . . . . . .2-5
Example of How Business Goals Determine Data Re-engineering
Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6
Phase Two: Understand the Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-6
Step One: Prepare for QualityStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-7
General Knowledge About the Source Data . . . . . . . . . . . . . . . . . . . .2-8
File Format of Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-8
Preparing Data for QualityStage . . . . . . . . . . . . . . . . . . . . . . . . . . .2-10
Step Two: Investigate the Source Data . . . . . . . . . . . . . . . . . . . . . . . . . .2-10
Organizing Source Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
Parsing Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-11
Classifying Source Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-12
Analyzing Patterns in Source Data. . . . . . . . . . . . . . . . . . . . . . . . . .2-12
Step Three: Evaluate the Results and Redefine the Project . . . . . . . . .2-12
Phase Three: Design and Develop the Re-engineering Application . . . . . . .2-13
Step One: Conditioning the Source Data . . . . . . . . . . . . . . . . . . . . . . . .2-14
About Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-14
Decisions You Make . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-14
Step Two: Matching the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-15
Example of Matching Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-16
Step Three: Determining Surviving Records and Formatting . . . . . . . .2-17
Keeping All Duplicate Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-17
Keeping Only One Record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-17
Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-18
Phase Four: Evaluating Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2-18

x QualityStage Designer User Guide


TABLE OF CONTENTS

Chapter 3
Using the QualityStage Development Environment
Installing QualityStage Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2
Starting QualityStage Designer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-2
Using the QualityStage Main Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4
Using the QualityStage Menu Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-4
File Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-5
Edit Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6
View Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6
Rules Menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-6
Help Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-7
Using the Left Pane of the QualityStage Main Window . . . . . . . . . . . . .3-7
Using the Right Pane of the QualityStage Main Window . . . . . . . . . . . .3-8
Using the QualityStage Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10
Using QualityStage Dialog Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-10
Selecting Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11
Moving Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11
Additional Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-11
Browsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12
Setting QualityStage Designer Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-12
Local Working Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13
Standardize Process Definition Directory . . . . . . . . . . . . . . . . . . . . . . . .3-13
Default Import Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-13
Preferred Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-14
Data Warehouse Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-14
How to Set Designer Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3-14

Chapter 4
Working with Projects
Creating QualityStage Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-1
Adding a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-2
Copying a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3
Deleting Projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-3
Exporting Projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-4
Exporting Datafile Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-6
Exporting Datafile Definitions via MetaBrokers . . . . . . . . . . . . . . . . . . .4-6
Exporting Datafile Definitions to MetaStage . . . . . . . . . . . . . . . . . . . . . .4-7
Importing Projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-8

QualityStage Designer User Guide xi


TABLE OF CONTENTS

Importing a COBOL Copybook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-9


Importing Datafile Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-12
Importing Datafile Definitions via MetaBrokers . . . . . . . . . . . . . . . . . .4-12
Importing Data Definitions from MetaStage . . . . . . . . . . . . . . . . . . . . .4-14
Working with Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-14
Input File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-15
About Defining Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-15
About the Results File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-16
About Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-16
Creating a New Datafile Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-16
Copying a Datafile Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-18
Modifying a Datafile Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-19
Working with Datafields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-21
Creating a Datafield Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-21
Modifying a Datafield Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-24
Deleting a Datafield Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-25
Using the Data Field Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-26
Defining Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-27
Using QualityStage Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-28
Using the Interface to Data Warehouse Center . . . . . . . . . . . . . . . .4-29
Setting Up Licensed Stages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4-31

Chapter 5
Setting Up Run Profiles
Creating a Run Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1
What Run Profiles Define . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-1
More About Run Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2
Creating and Managing Run Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2
Creating Run Profiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-2
Copying, Modifying, or Deleting Run Profiles . . . . . . . . . . . . . . . . . . . . .5-4
Defining an OS/390 Run Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-4
Defining a UNIX or Windows Run Profile . . . . . . . . . . . . . . . . . . . . . . . . . .5-10
Defining a Local Windows Run Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-16

Chapter 6
Building Jobs
Why You Use Jobs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-2
Building QualityStage Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-3

xii QualityStage Designer User Guide


TABLE OF CONTENTS

Creating a New Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-4


Renaming a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-5
Defining a Job Using Existing Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-5
Adding Existing Stages to a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-6
Reordering or Removing Stages in a Job . . . . . . . . . . . . . . . . . . . . . . . . .6-6
Setting Output Files to Include/Exclude for a Job . . . . . . . . . . . . . . . . . .6-7
Defining and Modifying Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-8
Creating a New Stage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-8
Modifying an Existing Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-9
Creating a New Stage from an Existing One . . . . . . . . . . . . . . . . . . . . . .6-9
Deleting a Stage from a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-10

Chapter 7
Deploying Jobs
About Deploying and Running Jobs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-1
Run Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-1
About Deploying Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2
About Running Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2
About Run Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-2
File Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3
Data Stream Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-3
Parallel Extender Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-4
Comparing Run Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5
Deploying a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-5
How to Deploy a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-6
Deploying Jobs in Data Stream Mode or Parallel Extender Mode . . . . .7-7
Deploying Jobs in File Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-7
Using the File Mode Execution Dialog Box for Deploying Jobs. . . . . . . .7-8
Deploying a Job Creates a Project File Structure . . . . . . . . . . . . . . . . . . . . . .7-9
Moving Input Data to the Correct Project Library Location . . . . . . . . . . . . .7-9
Deploying Jobs on an OS/390 Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-9
Deploying Jobs on a UNIX Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-10
Deploying Jobs on a Windows Server . . . . . . . . . . . . . . . . . . . . . . . . . . .7-10
Deploying Jobs on a Local Windows Server . . . . . . . . . . . . . . . . . . . . . .7-11

Chapter 8
Running Jobs
About Running Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-1

QualityStage Designer User Guide xiii


TABLE OF CONTENTS

Remote Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2


Local Windows Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-2
Running a Job from QualityStage Designer . . . . . . . . . . . . . . . . . . . . . . . . . .8-2
Running in Data Stream Mode or Parallel Extender Mode. . . . . . . . . . .8-6
Running in File Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-7
Running a Job from the Command Line on UNIX Systems. . . . . . . . . . . . . .8-8
Running a Job from the Command Line on Windows Systems . . . . . . . . . . .8-9
Using Parallel Extender Persistent Data Sets Instead of QualityStage
Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-10
Integrating QualityStage Parallel Extender Jobs with DataStage Parallel
Extender Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-11
Restarting a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-12
Viewing Job Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-13

Chapter 9
Defining Investigate Stages
Using an Investigate Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-2
Creating an Investigate Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-3
Using Character Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-5
Using the Pattern Reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-5
Using Discrete Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-6
Using Concatenate Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-7
Using the Field Mask. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-7
Creating a Character Investigate Stage . . . . . . . . . . . . . . . . . . . . . . . . . .9-9
Using Word Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-11
Using Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-12
Using Pattern Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-13
Using Word Frequency Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-14
Using Word Classification Reports . . . . . . . . . . . . . . . . . . . . . . . . . .9-16
Specifying Advanced Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-18
Creating a Word Investigation Stage . . . . . . . . . . . . . . . . . . . . . . . . . . .9-21
Running Investigate Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-23
Running in Parallel Extender Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-26
Running in File Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-27

xiv QualityStage Designer User Guide


TABLE OF CONTENTS

Chapter 10
Defining Standardize Stages
Using the Standardize Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-2
About Rule Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-4
Standardization Processing Flow for U.S. Records . . . . . . . . . . . . . . . .10-4
Domain Pre-Processor Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-5
Why You Use the Domain Pre-Processor Rule Sets . . . . . . . . . . . . . . . .10-6
Preparing the Input File for the Domain Pre-Processor . . . . . . . . . . . .10-8
Domain-Specific Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-9
Validation Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-10
Standardized Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-10
Rules Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-11
Defining Standardize Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-11
Defining the Input File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-11
Inserting Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-11
Delimiter Literals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-12
Defining the Results File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-12
Creating a Standardize Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-13
Selecting Rule Sets, Fields, and Literals . . . . . . . . . . . . . . . . . . . . . . .10-15
Using the Append Field Selection Dialog Box . . . . . . . . . . . . . . . .10-20
Using the Data Selection for Reports Dialog Box . . . . . . . . . . . . .10-21
Specifying Case Formatting Options. . . . . . . . . . . . . . . . . . . . . . . . . . .10-22
Using Classification Tokens to Specify Fields for Case Formatting . . .
10-23
Applying Case Formatting Rules . . . . . . . . . . . . . . . . . . . . . . . . . .10-23
Case Formatting Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-23
Running Standardize Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-24
Running in Data Stream Mode or Parallel Extender Mode. . . . . . . . .10-26
Running in File Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-28
Standardizing a Multinational Address File Using Standardize. . . . . . . .10-29
About the Country Identifier Rule Set . . . . . . . . . . . . . . . . . . . . . . . . .10-30
Using the Country Identifier Rule Set . . . . . . . . . . . . . . . . . . . . . . . . .10-31
Preparing the Input File for the Country Identifier. . . . . . . . . . . . . . .10-31
Managing the Rule Sets and Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-32
Accessing the Rules Management Dialog Box . . . . . . . . . . . . . . . . . . .10-33
Viewing or Modifying Rule Set Files and Tables . . . . . . . . . . . . . .10-34
Creating New Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10-34

QualityStage Designer User Guide xv


TABLE OF CONTENTS

Chapter 11
Defining Multinational Standardize Stages
The Multinational Standardize Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2
Which Countries Can Be Standardized . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2
City-Level Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-2
Street-Level Standardization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-3
Modifying Standardization Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . .11-3
Input File Requirements and Recommendations . . . . . . . . . . . . . . . . . . . . .11-4
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-4
Recommendations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-4
Input Field Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-5
Creating a Multinational Standardize Stage . . . . . . . . . . . . . . . . . . . . . . . .11-5
Running Multinational Standardize Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . .11-9
Running in Data Stream Mode or Parallel Extender Mode. . . . . . . . .11-12
Running in File Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-14
Multinational Standardize Output Fields . . . . . . . . . . . . . . . . . . . . . . . . . .11-15
About the Output File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11-18

Chapter 12
Defining Match Stages
About Matching Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-2
Using Match Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-3
One-To-One Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-4
Many-To-One Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-4
Matching for Unduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-5
Blocking Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-5
The Strategy For Using Match Passes . . . . . . . . . . . . . . . . . . . . . . .12-5
Matching Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-6
About m-probability and u-probability . . . . . . . . . . . . . . . . . . . . . . .12-7
About Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-7
About Cutoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-8
About Unduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-8
Reviewing the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-9
Extracting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-9
Defining Match Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-10
Defining Input Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-10
Defining Output Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-10
Defining a Match Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-11

xvi QualityStage Designer User Guide


TABLE OF CONTENTS

Creating a Match Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-12


Using a Template Match Specification . . . . . . . . . . . . . . . . . . . . . . . . .12-15
Template Match Data Requirements . . . . . . . . . . . . . . . . . . . . . . . . . .12-15
Individual Customer Match. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-15
Individual File Unduplication Match . . . . . . . . . . . . . . . . . . . . . . .12-16
Business File Unduplication Match . . . . . . . . . . . . . . . . . . . . . . . .12-18
Individual Housholding Match . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-19
Business Housholding Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-20
To Use a Template Match Specification: . . . . . . . . . . . . . . . . . . . .12-22
Creating a Custom Match Specification . . . . . . . . . . . . . . . . . . . . . . . .12-25
Defining a Pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-27
Specifying Blocking Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-27
Specifying Matching Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-30
Assigning Match Pass Cutoffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-34
Specifying the m-Probability and u-Probability . . . . . . . . . . . . . . .12-35
Using Reverse Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-35
Using Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-36
Specifying Weight Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-36
Defining Vartypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-40
Running Match Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-42
Running in Data Stream Mode or Parallel Extender Mode. . . . . . . . .12-45
Running in File Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-46
Setting Match Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-48
Using an XML Match Spec to Create a Match . . . . . . . . . . . . . . . . . . . . . .12-49
How Can I Use an XML Match Spec? . . . . . . . . . . . . . . . . . . . . . . . . . .12-50
Selecting an XML Match Spec. . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-50
Creating Your Own XML Match Spec Files . . . . . . . . . . . . . . . . . . . . .12-51
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-52
XML Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-52
Editing XML Match Spec Templates . . . . . . . . . . . . . . . . . . . . . . .12-52
XML DTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-53
XML Vocabulary Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12-54

Chapter 13
Working with Match Reports
About Match Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-2
Using the Default Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-2
Customizing a Match Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-3
Defining a Custom Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-4

QualityStage Designer User Guide xvii


TABLE OF CONTENTS

Specifying the Report Layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-7


About Match Extracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-13
Using the Default Extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-14
Customizing a Match Extract. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-15
Defining a Custom Extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-16
Specifying the Extract Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-20
Creating Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-22
Maintaining Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-23
Extract Statements and Arguments . . . . . . . . . . . . . . . . . . . . . . . .13-23
Using the Statistics Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-28
Run Information Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-28
Frequency Information Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-29
Summary Statistics Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-30
Histogram Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-31
Viewing Extracts and Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-32

Chapter 14
Defining Survive Stages
Using the Survive Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-2
Grouping Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-3
Defining Survive Stage Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-4
Defining the Input File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-4
Defining the Results File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-4
Creating a Survive Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-4
Using the Survive Stage Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-5
Defining Survive Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-8
Defining Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-8
Defining a Simple Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-9
Defining a Complex Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-11
Using the Survivorship Rule Expression Builder . . . . . . . . . . . . .14-12
Adding the Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-13
Selecting Data for a Predefined QualityStage Report . . . . . . . . . . . . .14-15
Modifying and Maintaining Survivorship Rules . . . . . . . . . . . . . .14-16
Running Survive Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-18
Running in Data Stream Mode or Parallel Extender Mode. . . . . . . . .14-21
Running in File Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-22
Creating Rules Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-23
Rule Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-24
Rule Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-25

xviii QualityStage Designer User Guide


TABLE OF CONTENTS

Rule Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-26


Examples of Rule Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-27
First Record in Group Survives. . . . . . . . . . . . . . . . . . . . . . . . . . . .14-28
At Least One from Each Group Survives . . . . . . . . . . . . . . . . . . . .14-28
Date as a Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-28
Multiple Targets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-28
Using the File from Which the Record Originated. . . . . . . . . . . . .14-28
Using the Length of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-29
Using Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-29
Multiple Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14-29

Chapter 15
Working with QualityStage Reports
Using Stage Wizards to Prepare Data for Predefined QualityStage Reports . .
15-2
Creating and Running QualityStage Reports Using Unprepared Data . . .15-2
Preparing Your Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3
Converting Output Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3
Adding a .txt Extension to Flat Files . . . . . . . . . . . . . . . . . . . . . . . .15-3
File Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3
Creating Customized Access Reports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4
Designing a Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4
Creating a Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-4
Creating a Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-5
Creating a Query With the Query Wizard . . . . . . . . . . . . . . . . . . . . . . .15-5
Creating a Report in the Design View. . . . . . . . . . . . . . . . . . . . . . . . . . .15-6
Creating a Report with the Report Wizard . . . . . . . . . . . . . . . . . . . . . . .15-6
Testing and Debugging the Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-7
Creating a Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-8
Generating and Viewing QualityStage Reports . . . . . . . . . . . . . . . . . . . . . .15-9
Specifying the Data Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-10
ODBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-11
Microsoft Access Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-11
Flat Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-11
Specifying the Reports Database Location . . . . . . . . . . . . . . . . . . . . . .15-12
Selecting and Running a QualityStage Report . . . . . . . . . . . . . . . . . . .15-12
Saved Report Entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-13
About Predefined QualityStage Reports . . . . . . . . . . . . . . . . . . . . . . . . . . .15-14
Reports Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-14

QualityStage Designer User Guide xix


TABLE OF CONTENTS

Investigation Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-15


Preparing the Investigation Output File . . . . . . . . . . . . . . . . . . . .15-15
Investigation Character Discrete and Character Type Reports . .15-15
Investigation Word Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-16
Country-specific Standardization Reports . . . . . . . . . . . . . . . . . . . . . .15-17
Country Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-17
Choosing the Correct Standardization Report . . . . . . . . . . . . . . . .15-17
Country-Specific Standardization Reports . . . . . . . . . . . . . . . . . . .15-18
Country-Specific Standardization Attributes. . . . . . . . . . . . . . . . .15-19
Standardization CC Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-20
Standardization CC Appended Report . . . . . . . . . . . . . . . . . . . . . .15-20
Standardization CC Report with Prep . . . . . . . . . . . . . . . . . . . . . .15-21
Standardization CC Appended Report with Prep . . . . . . . . . . . . .15-22
Standardization CC Name Report . . . . . . . . . . . . . . . . . . . . . . . . .15-22
Standardization CC Appended Name Report. . . . . . . . . . . . . . . . .15-23
Standardization CC Area Report . . . . . . . . . . . . . . . . . . . . . . . . . .15-23
Standardization CC Appended Area Report. . . . . . . . . . . . . . . . . .15-24
Standardization CC Address Report. . . . . . . . . . . . . . . . . . . . . . . .15-24
Standardization CC Appended Address Report . . . . . . . . . . . . . . .15-25
Standardization CC Prep Report . . . . . . . . . . . . . . . . . . . . . . . . . .15-25
Standardization CC Appended Prep Report. . . . . . . . . . . . . . . . . .15-26
Standardization CC Summary Report . . . . . . . . . . . . . . . . . . . . . .15-26
Standardization CC Appended Summary Report . . . . . . . . . . . . .15-26
Standardization CC Summary Report with Prep . . . . . . . . . . . . .15-27
Standardization CC Appended Summary Report with Prep . . . . .15-27
Standardization WAVES/Multinational Report. . . . . . . . . . . . . . .15-28
Standardization WAVES/Multinational Appended Report . . . . . .15-28
Matching Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-29
Match Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-30
Match Grouping Summary Report . . . . . . . . . . . . . . . . . . . . . . . . .15-30
Match Histogram Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-30
Match Output Review Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-30
Match Summary Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-31
Match Unduplication Summary Report . . . . . . . . . . . . . . . . . . . . .15-31
Survivorship Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-31

Chapter 16
Using the QualityStage Data File and Report Viewer
Selecting the Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-2

xx QualityStage Designer User Guide


TABLE OF CONTENTS

Choosing the File to View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-2


Using the QualityStage Data File and Report Viewer . . . . . . . . . . . . . . . . .16-3
Navigating. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-4
Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16-5
Troubleshooting the Data File and Report Viewer . . . . . . . . . . . . . . . . . . . .16-6

Appendix A
Importing Projects from MVS and UNIX into QualityStage Designer
Preparing Your Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Collecting Information from the UNIX or MVS System . . . . . . . . . . . . A-2
For UNIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
For MVS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Transferring the PDS Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
Updating the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
Creating the IMF File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5
Input File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5
Job List File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6
Control List File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-6
Data Definition List File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7
Using the jcl_cnv Command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-7
Conversion Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-8
Converting Data File Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . A-8
Sorts Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9
Limited Operation Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
Understanding Conversion Problems . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
Warnings Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
Fatal Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-11
Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-11
Things to Check After Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12

Appendix B
Match Comparisons
ABS_DIFF Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
AN_DINT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
AN_INT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
CHAR Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
CNT_DIFF Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5

QualityStage Designer User Guide xxi


TABLE OF CONTENTS

D_INT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-6


D_USPS Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-7
DATE8 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-8
DELTA_PERCENT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-10
DISTANCE Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-11
INT_TO_INT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-12
INTERVAL_NOPAR Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-13
INTERVAL_PARITY Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-14
LR_CHAR Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-15
LR_UNCERT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-16
MULT_EXACT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-18
MULT_RANGE Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-18
MULT_UNCERT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-19
NAME_UNCERT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-20
NUMERIC Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-21
PREFIX Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-22
PRORATED Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-22
TIME Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-24
UNCERT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-25
USPS Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-26
USPS_DINT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-27
USPS_INT Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-30

Appendix C
Rule Set Files
Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1
Rule Set Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Dictionary File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
Field Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-5
Classification Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-6
Threshold Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-7
The Null Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-8
Pattern-Action File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-8
Pattern Matching Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-8
Tokenization and Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-9
Pattern-Action File Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-11
Rule Set Description File (.PRC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-14
Lookup Tables (.TBL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-14

xxii QualityStage Designer User Guide


TABLE OF CONTENTS

Override Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-14


Where Rule Set Files and Override Tables Are Located . . . . . . . . . . . . . . C-15

Appendix D
More About Using Rules
Country Identifier Rule Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
Input File: Country Code Delimiters. . . . . . . . . . . . . . . . . . . . . . . . . . . . D-2
Output File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-2
Domain Pre-Processor Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-3
Input File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-3
Why You Use the Domain Pre-Processor Rule Sets . . . . . . . . . . . . . . . . D-4
Domain Pre-Processor File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-5
Domain Pre-Processor Dictionary File . . . . . . . . . . . . . . . . . . . . . . . . . . D-6
Domain Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-6
Reporting Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-7
User Flag Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-8
Domain Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-8
Upgrading Pre-Processor Rule Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-8
Domain-Specific Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-10
Domain-Specific File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-11
Domain-Specific Dictionary Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-11
Business Intelligence Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-12
Matching Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-12
Reporting Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-13
Data Flag Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-13
Validation Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-14
Validation File Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-14
VDATE Rule Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-15
Default Parsing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-15
Input Date Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-15
Output Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-16
Business Intelligence Output Fields . . . . . . . . . . . . . . . . . . . . . . . . D-16
Error Reporting Output Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-17
VEMAIL Rule Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-17
Default Parsing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-18
Parsing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-18
Business Intelligence Output Fields . . . . . . . . . . . . . . . . . . . . . . . . D-18
Error Reporting Output Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-19

QualityStage Designer User Guide xxiii


TABLE OF CONTENTS

VPHONE Rule Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-19


Classification Table Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-20
Default Parsing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-20
Parsing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-20
Validation Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-21
Business Intelligence Output Fields . . . . . . . . . . . . . . . . . . . . . . . . D-22
Error Reporting Output Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-23
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-23
VTAXID Rule Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-24
Default Parsing Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-24
Parsing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-24
Validation Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-25
Business Intelligence Output Fields . . . . . . . . . . . . . . . . . . . . . . . . D-25
Error Reporting Output Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-26
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-26

Appendix E
Customizing and Testing Rule Sets
Rule Set Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1
Rule Set Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-2
Using Override Tables to Customize Rule Sets . . . . . . . . . . . . . . . . . . . . . . E-2
Domain Pre-Processor Override Tables . . . . . . . . . . . . . . . . . . . . . . . . . E-4
Domain-Specific Override Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-4
Validation Override Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-5
QualityStage WAVES/Multinational Address Override Tables . . . . . . E-5
Working with Multiple Projects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-5
Domain Pre-Processor Rule Set Process . . . . . . . . . . . . . . . . . . . . . . . . . E-6
Domain-Specific, Validation, and WAVES/Multinational Address Rule Set
Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-8
Using Overrides to Customize Rule Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . E-9
Domain Pre-Processor Overrides. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-11
Adding Classification Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . E-11
Adding Input Pattern and Field Pattern Overrides. . . . . . . . . . . . E-15
Adding Input Text and Field Text Overrides . . . . . . . . . . . . . . . . . E-19
Creating Domain-Specific, Validation, and WAVES/Multinational Address
Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-22
Adding Classification Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . E-22
Adding Input Pattern and Unhandled Pattern Overrides. . . . . . . E-22

xxiv QualityStage Designer User Guide


TABLE OF CONTENTS

Adding Input Text and Unhandled Text Overrides . . . . . . . . . . . . E-27


Modifying and Maintaining Overrides . . . . . . . . . . . . . . . . . . . . . . . . . E-32
Deleting Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-32
Modifying Overrides. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-32
Creating a Override Similar to an Existing One . . . . . . . . . . . . . . E-33
Action Codes for Domain-Specific, Validation, and WAVES/Multina-
tional Standardize Rule Sets . . . . . . . . . . . . . . . . . . . . . . . E-33
Testing Standardization Rule Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-34
Opening the Standardization Rules Analyzer . . . . . . . . . . . . . . . . . . . E-35
Testing a Domain Pre-Processor Rule Set . . . . . . . . . . . . . . . . . . . . . . E-37
Testing a Domain-Specific Rule or Validation Rule Set. . . . . . . . . . . . E-39
User Modification Subroutines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-40
Subroutine Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-40
Country Identifier User Subroutines . . . . . . . . . . . . . . . . . . . . . . . . . . E-40
Domain Pre-Processor User Subroutines . . . . . . . . . . . . . . . . . . . . . . . E-41
Input Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-41
Continuation Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-41
Field Modifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-42
Domain-Specific User Subroutines . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-42
Input Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-43
Unhandled Modifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-43

Appendix F
ISO Country Codes

Appendix G
Sharing Dictionary Fields and Variable Names Across Rule Sets
Scoping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-1
Log File Warning Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-2
Scoping For Dictionary Field Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-2
Backward Compatibility for Dictionary Field Name Scopes . . . . . . . . . G-3
Modifying a Previous Rule Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-3
Scoping for Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-4
Backward Compatibility for Variable Name Scopes . . . . . . . . . . . . . . . G-5

QualityStage Designer User Guide xxv


TABLE OF CONTENTS

Appendix H
Using AuditStage with QualityStage
Source File Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-2
Overview of Building a Source File from Data Tables . . . . . . . . . . . . . . H-2
Exporting a Sample Source File . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-3
Pre-Standardization Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-3
Validating at the Row Level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-3
Using Your AuditStage Results in QualityStage . . . . . . . . . . . . . . . . . . H-4
Tuning Standardization and Matching Jobs . . . . . . . . . . . . . . . . . . . . . . . . . H-4
Testing the Results Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-4
Accessing QualityStage Data for Testing . . . . . . . . . . . . . . . . . . . . . H-5
Sampling the Results Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-5
Creating and Using a Sample Data Set . . . . . . . . . . . . . . . . . . . . . . H-5
Maintaining Your QualityStage Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-6
Index

xxvi QualityStage Designer User Guide


Preface

Ascential QualityStageTM is a comprehensive development


environment for building applications that re-engineer data. This data
re-engineering environment is designed to help programmers,
programmer analysts, business analysts, and others re-engineer data
to meet business objectives and data quality management standards.
Using QualityStage you can quickly and easily process large stores of
data, selectively conforming the data as needed. QualityStage
provides a set of integrated modules for accomplishing data
re-engineering tasks such as:
• Investigating
• Conditioning (standardizing)
• Matching
• Building relationships
• Resolving conflicts
• Formatting data

About This Guide


This guide describes the workflow for creating re-engineered data and
how to work with QualityStage. It is intended for managers and

QualityStage Designer User Guide xxvii


PREFACE
Prerequisites for Using QualityStage

technical and data analysts responsible for implementing


QualityStage projects.

Prerequisites for Using QualityStage


To use QualityStage, you need to know:
• How to use Windows-based applications
• How to use your Windows operating system
• How to use the operating system of your QualityStage server
• How to use the text editors on your QualityStage server
This guide assumes that you have a good understanding of the data on
which you will be working.
In addition, we strongly recommend that you:
• Take the Ascential training course Fundamentals of QualityStage
• Walk through the example in the QualityStage: Getting Started
manual

Related Documentation
In addition to this user guide, the QualityStage documentation also
includes:
• QualityStage: Getting Started
Uses a simple real-life example to walk the reader through a data
re-engineering project. Topics include creating a project; defining
data files; and running jobs that use the Investigate, Standardize,
Match, and Survive stages.
• QualityStage OS/390 Server Guide
Describes how to install the server software, how to verify
installation, how to transfer jobs in a non-FTP environment, and
how to troubleshoot common problems. It also includes some
sample JCL.

xxviii QualityStage Designer User Guide


PREFACE
Related Documentation

• QualityStage UNIX , Linux, and Windows Server Guide


Describes how to install the QualityStage server software on
UNIX, Linux and remote Windows platforms, how to verify installation,
how to transfer jobs in a non-FTP environment, and how to
troubleshoot common problems. It also describes how to install
and configure the QualityStage Real Time Manager (QSRT
Manager).
• QualityStage Stages Reference Guide
Contains technical information about stages used for building
QualityStage jobs: Transfer, Collapse, Select, Sort, Parse, Build,
Unijoin, and Abbreviate.
• QualityStage Match Concepts and Reference Guide
Describes and provides reference information about the technical
aspects of the Match stage. Topics include specifying the data
dictionary, types of matches, defining variables, and special
variables.
• QualityStage Pattern- Action Reference Guide
Provides technical and reference information about the
pattern-action language for configuring business rules processes.
• QualityStage WAVES Stage Guide
Describes how to use the WAVES stage to standardize, correct,
verify, and enhance addresses against country-specific postal
reference files.
• QualityStage CASS and Z4Change Stages Guide
Describes how to use the QualityStage CASS and QualityStage
Z4Change stages to verify U.S. mailing lists for USPS
certification.
• QualityStage SERP Stage Guide
Describes how to use the QualityStage SERP stage to verify
Canadian mailing lists for Canada Post certification.
• QualityStage DPID User Guide
Describes how to use the QualityStage DPID stage to verify
Australian mailing lists for certification.

QualityStage Designer User Guide xxix


PREFACE
Documentation Conventions

• QualityStage Guide to the Data Warehouse Center Interface


Describes the installation, architecture, and logical data flow of
the module. Also provides a walk-through description of how to
use the dialog boxes.
• QualityStage Real Time Developer’s Guide
This guide describes the Java, COM, and C API function calls you
can use in QualityStage Real Time application programs.
• QualityStage Help
Provides online help at both context-sensitive and dialog box level
within QualityStage Designer. Describes valid values for entry
fields and fuller explanations of selection fields; also provides
step-by-step cursory descriptions of how to perform certain
processes and jobs within a dialog box.

Documentation Conventions
This guide uses the following conventions:
• User entries and book titles appear in italic typeface.
• Arrows represent a menu path, for example Select Edit ➤ Paste.
• Examples are represented by Arial font.
• Note: indicates nice-to know information.
• Important: identifies vital information.
• Caution: warns you about actions that could cause damage to data
or unintentional termination of processing.

QualityStage Terminology
As of release 7.0 of QualityStage, certain INTEGRITY terms have
changed. This guide uses the new QualityStage terminology
throughout.

xxx QualityStage Designer User Guide


PREFACE
Documentation Conventions

The following table lists the new equivalents to the older terminology
used in earlier versions of QualityStage:

Old Term New Term


INTEGRITY QualityStage
Procedure Job
Proc Job
Built procedure Job
Operator Stage
Operation Stage
Super operator Stage
Super op Stage
Pre-built procedure Stage
Licensed operation Stage
To stage To deploy
Staging Deploying
SuperSTAN Standardize
SuperMATCH Match
Project unchanged
Investigation Investigate
Survivorship Survive

Here are several examples of how the new terms are used:

Old Usage New Usage


Investigation pre-built procedure Investigate stage
SuperSTAN pre-built procedure Standardize stage
SuperMATCH pre-built procedure Match stage

QualityStage Designer User Guide xxxi


PREFACE
Additional Information and Assistance

Old Usage New Usage


Survivorship pre-built procedure Survive stage
The procedure comprises the following The job comprises the following
operations and pre-built procedures. stages.
Stage and run the procedure. Deploy and run the job.

Additional Information and Assistance


Additional information is available through Ascential training classes.
If you require assistance or have questions about QualityStage, you
can contact Ascential Customer Support by:
• Calling:
(866) INFONOW
• Writing:
support@ascentialsoftware.com
• Logging onto the Ascential Support e.Service Web site at:
www.ascential.com:8080/eservice/index.html
A product expert is available to help you between the hours of 8:30 am
and 5:30 pm, EST.
For current information about Ascential and its products log on to
http://www.ascential.com/

xxxii QualityStage Designer User Guide


1

Welcome

This chapter introduces data re-engineering applications,


QualityStage, and the information you need to use this manual.

Using Re-engineered Data


Your company uses re-engineered data when creating or maintaining
any of the following systems:
• Customer Information Systems.
• Data Warehouses and Data Marts.
• Business Intelligence Initiatives.
• ERP Implementations.
• Operational Data Stores.
• e-Commerce.
In addition, there are other uses for re-engineered data such as:
• Data Conversion/Consolidation Projects.
• Data Audits and Data Validation.
• Consolidated Customer, Vendor, and Product Views.
• Mergers and Acquisitions.
• Master-Patient Indexing.

QualityStage Designer User Guide 1-1


1 WELCOME
Introducing QualityStage

• Consolidated Billing.
• Ongoing Maintenance of Operational Data.

Introducing QualityStage
QualityStage is a client/server application. QualityStage Designer
provides a client interface for defining and customizing data
re-engineering jobs. QualityStage Designer runs on a Windows
workstation.
Whereas the QualityStage Designer defines how source data will be
processed, the QualityStage server accesses the source data, and
processes them into the target re-engineered data. It maintains the
data fields and executes the QualityStage data re-engineering jobs
that you build with QualityStage Designer.
The QualityStage server runs on the following systems:
• Windows
• OS/390
• UNIX
For more information on the operating systems that the QualityStage
server runs on, see the QualityStage OS/390 Server Guide and the
QualityStage UNIX, Linux, and Windows Server Guide.

QualityStage and QualityStage Real Time


In its standard configuration (as described in this guide),
QualityStage jobs run as batch processes. That is, you use a complete
input file to run a QualityStage job in a stand-alone processing
environment. QualityStage Designer lets you directly invoke the
QualityStage server.
You can use QualityStage Real Time to integrate QualityStage
functionality within other software applications. This feature provides
you with APIs that let you run your QualityStage jobs (built with
QualityStage Designer) remotely on the QualityStage server. This

1-2 QualityStage Designer User Guide


WELCOME
Product Highlights

option requires the purchase of a QualityStage Real Time license. See


your Ascential representative for more information.

Product Highlights
This section highlights key capabilities and benefits.

Supports Data Quality Management Standards


QualityStage helps you manage and maintain data quality so that
your company can rely upon its corporate data investment. It helps
organizations produce high-quality data as well as lower the cost of
data management.
QualityStage enables organizations to produce high-quality,
consolidated information by:
• Correlating and matching selected entities to uncover
relationships.
• Automating data re-engineering tasks according to an
organization's business rules and information requirements.
• Standardizing data in free-form fields and across disparate data
sources.
QualityStage lowers costs of data management by:
• Providing control over data and the resulting quality of
information.
• Uncovering information buried in free-form fields and identifying
relationships between data values.
• Offering the capability to view data as related information.
• Ensuring that the data populating the new system is of the highest
data quality.

QualityStage Designer User Guide 1-3


1 WELCOME
Product Highlights

Feature Highlights
QualityStage features:
• Robust data re-engineering solution – including data investigation,
parsing, matching, and reconciliation
• GUI/menu-driven, easy to learn
• Stages and tables that automate common data re-engineering
operations
• Over 24 match comparison algorithms providing a full spectrum of
fuzzy matching functions
• Callable libraries for real-time matching
• Draws few programming resources and reduces need for clerical
staff
• Can be customized for your business rules
• Build once, run anywhere, run everywhere

Benefits Highlights
QualityStage benefits your company because it:
• Constructs consolidated customer views for the purpose of
cross-selling, up-selling, and customer retention
• Reduces time and cost to implement ERP (SAP, Baan, PeopleSoft,
JDE, etc.) initiatives
• Improves customer support and identifies your most profitable
customers
• Maximizes purchasing power for consolidated vendor projects
• Improves inventory control management via consolidated views of
inventory/product – to sell more products while increasing profit
margins

1-4 QualityStage Designer User Guide


WELCOME
How QualityStage Works

How QualityStage Works


QualityStage provides control over data quality and enables
companies to re-purpose operational data into strategic enterprise
intelligence. It does this through a process that applies lexical
analysis (parsing and pattern processing) and probabilistic matching
according to your business rules.
The process comprises:
• Investigation
• Conditioning (standardization)
• Matching
• Survivorship and formatting

Investigation
Data investigation gives 100 percent visibility into the actual
condition of data, providing a sound understanding of the information
in legacy sources. This lets you identify and correct data problems
before they corrupt new systems.
Investigation parses and analyzes free-form and single domain fields,
determining the number and frequency of unique values and
classifying or assigning a business meaning to each occurrence of a
value within a field. As a result, investigation:
• Uncovers trends, potential anomalies, metadata discrepancies, and
undocumented business practices
• Identifies invalid or default values
• Reveals common terminology
• Verifies the reliability of fields proposed as matching criteria

Conditioning
Based on the understanding of the data gleaned from investigation,
conditioning standardizes and reformats data from multiple systems

QualityStage Designer User Guide 1-5


1 WELCOME
How QualityStage Works

to create a consistent data representation with fixed, discrete fields,


according to your company rules. As a result, conditioning:
• Creates fixed-field, addressable data
• Facilitates effective matching
• Enables output formatting
In conditioning, QualityStage provides the ability to transform any
data type into your desired standards. It standardizes data —
applying consistent representations, correcting misspellings, and
incorporating business or industry standards — and formats it,
placing each value into a single domain field and transforming each
single domain field into a standard format.

Matching
Matching ensures data integrity. Matching:
• Identifies duplicate entities (such as customers, suppliers,
products, parts, etc.) within one or more files
• Creates a consolidated view of an entity
• Performs householding
• Establishes cross-reference linkage
• Enriches existing data with new attributes from external sources
QualityStage applies probabilistic matching technology to any
relevant attribute — evaluating user-defined full fields, parts of fields,
or even individual characters — and assigns agreement weights and
disagreement weights to key data elements, based on a number of
factors such as frequency distribution, discriminating value, and
reliability.
It can also gauge the number of differences in a field, to account for
errors such as transpositions (for example, in Social Security
numbers). It can match records character by character exactly or find
and match even nonexact record matches (and provide a probable
likelihood that two records match), in the absence of common keys. It
matches records faster and more accurately than any visual inspection

1-6 QualityStage Designer User Guide


WELCOME
How QualityStage Works

or other matching tool on the market and produces auditable and


legally defensible match results.

Survivorship and Formatting


Survivorship and formatting ensure that the best available data
survives and is correctly prepared for the target destination.
• Survivorship consolidates duplicate records, creating a “best of
breed” representation of the matched data, enabling companies to
cross-populate all data sources with the best available data.
• Formatting implements the business and mapping rules, creating
the necessary output structures for the target application and
identifying fields that don’t conform to load standards.
Performed on groups of matched records, survivorship creates a “best
of breed” data representation. Survivorship:
• Supplies missing values in one record with values from other
records on the same entity
• Resolves conflicting data values on an entity according to your
business rules
• If desired, enriches existing data with data from external sources.
The result is complete, accurate surviving data. Formatting
customizes the output to meet specific business and technical
requirements.

QualityStage Designer User Guide 1-7


1 WELCOME
How QualityStage Works

1-8 QualityStage Designer User Guide


2

The Workflow for Creating


Re-engineered Data

Your company’s data contains valuable corporate information that


your company needs in order to conduct business, whether it’s
managing customers and products, managing operations, evaluating
corporate performance, or providing business intelligence.
QualityStage helps you deliver and maintain data quality so that your
company can rely upon its corporate data investment.
To help you work efficiently with QualityStage and get the most
appropriate results, you need to make certain assumptions and
decisions before you start working with QualityStage.
Whereas the rest of this manual explains how to use QualityStage,
this chapter provides an overview of the process that delivers
re-engineered data and defines the workflow for each phase of the
process.

What is Re-engineered Data?


Whether your company is transitioning from one or more information
systems to another, upgrading its organization and its processes, or
integrating and leveraging information across the enterprise, your
goal is to determine the requirements and structure of the data that

QualityStage Designer User Guide 2-1


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Overview of the Re-engineering Process

will address the business goal. Data that is restructured to conform to


these new requirements is called re-engineered data.
Any process for creating re-engineered data should:
• Resolve conflicting and ambiguous meanings for data values.
• Bring to the surface new or hidden attributes from free-form and
loosely controlled source fields.
• Standardize data to give addressability to components of data.
• Identify duplication and relationships among such business
entities as customers, prospects, vendors, suppliers, parts,
locations, and events.
• Create one unique view of the business entity.
• Let you enrich the re-engineered data, such as adding coding
accuracy support system-certified address data.

Overview of the Re-engineering Process


Knowledge of the overall workflow will help you streamline your data
re-engineering implementation.
Creating re-engineered data is a four-phase approach:
• Phase One: Understand business goals and how they determine
your requirements.
• Phase Two: Understand the nature and content of the source
data.
• Phase Three: Design and develop the application that
re-engineers the data.
• Phase Four: Evaluate the results.

2-2 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Overview of the Re-engineering Process

Figure 2-1 Overview of the Workflow

Important: The results of each phase influence the next one.

Overview of Phase One


Phase One of the workflow helps you:
• Translate high-level mission directives into specific data
re-engineering assignments.
• Make assumptions about the requirements and structure of the
target re-engineered data.
This phase is discussed in more detail later in this chapter.

Overview of Phase Two


Phase Two of the workflow helps you:
• Identify whether the source data has the basic structure that your
target data requires.
• Understand the content of the source data.
• Create the input data used in the next phase.
This phase is discussed in more detail later in this chapter.

QualityStage Designer User Guide 2-3


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase One: Understand the Business Goals

Overview of Phase Three


In Phase Three of the workflow, you use specific QualityStage features
to design a QualityStage application that generates the target
re-engineered data you need. You then run this application on the
data produced in Phase Two.
This phase is discussed in more detail later in this chapter.

Overview of Phase Four


In this phase of the workflow, you look at the results of the process
and determine whether you need to:
• Revisit a previous phase.
• Refine some of the conditions.
• Repeat the process, starting from the phase you revisit.
If your re-engineering goals are simple, you may be satisfied with the
results of the first re-engineering application. Otherwise, you may
have to repeat this workflow, making different decisions and further
refinements with each iteration.
This phase is discussed in more detail later in this chapter.

Phase One: Understand the Business Goals


Let’s take a closer look at the workflow. Before you start designing
your data re-engineering project, you should understand the business
goals that are driving the data re-engineering need and how they
define your data re-engineering assignment (the effective goal).
Understanding the data mission that satisfies the business goal helps
you define the requirements and structure of the target data. It also
determines the level of data quality that your target re-engineered
data needs to meet. This insight will help you gain a sense of the
complexity of the intended re-engineered data and provide a context in
which you can make decisions throughout the workflow.

2-4 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase One: Understand the Business Goals

How High Quality Data Meets Business Goals


What you don’t know about your data can hurt your company and
have the potential to affect operations and revenues. Poor data quality
can occur when you don’t have a single profile of a customer because
the customer name was logged differently for several transactions
(James M. Hill, Jim Hill, J.M. Hill), and now you have multiple records for
the same person. Poor data quality can also occur when critical
information is buried in free-form fields or simply stored in the wrong
place (for example, social security numbers in the name field).
For example, look at the case of the merger of two companies. If the IT
department did not know that important customer data was entered
in various free-form fields, the conversion process could drop this data.
Users of this information would have incomplete customer records and
would be unable to handle customers effectively.
In another example, an extra character is found at the end of a name
field in some records. The letter c, it turns out, is a credit hold
indicator. This undocumented business practice could be lost if the c is
interpreted as part of the name, or even thrown away if interpreted as
an extraneous, expendable character. This type of character problem
by itself may seem minor. But multiplied across thousands of records,
it jeopardizes the quality of your data as well as the results of any
analysis using that data.
Data is high quality when it is up-to-date, complete, accurate, and
easy to use. Depending on your business goals, high quality data could
mean:
• Your customer records don’t include duplicate records for the same
person.
• Your inventory records don’t include duplicates for the same
materials.
• Your vendor records don’t include vendors you no longer use or
suppliers no longer in business.
• You can be confident that Paul Allen and Allen Paul are records for two
different customers, not the result of a data entry mistake.
• Your employees can find the data they need when they need it.
Confident that they are working with high quality data, they won’t
need to create their own individual version of a database.

QualityStage Designer User Guide 2-5


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Two: Understand the Source Data

• You are aware of patterns and relationships within your data that
you can use to analyze your organization and forecast trends.

Example of How Business Goals Determine Data Re-engineering


Requirements
Suppose a company wants to reduce costs associated with returned
products. Management wants to start by analyzing its customer
support data to determine which products have the highest rate of
return and why. The business goal is to use information to improve
the product or product mix, so the company management authorizes
the creation of a data warehouse and the use of data mining tools.
High quality data is vital to the mission of the data warehouse.
For you, the effective business goal would be to determine which
products have high returns and determine any patterns that could
indicate whether the product as a whole, or a particular aspect of the
product, is at issue. Therefore, your re-engineering assignment would
be to identify the fields and data that would uncover this information.
The data requirements and structure of the target re-engineered data
might need to accommodate at least:
• A product identifier.
• An indicator as to whether the product was returned.
• A way to determine the total number of returns for each product.
• A way to determine if returns are affected by seasonal issues.
• A way to capture a reason for returning the product.

Phase Two: Understand the Source Data


Phase Two helps you begin understanding the size and complexity of
the project for creating re-engineered data. If the granularity and
structure of the source data closely matches your initial impression of
the structure and requirements of the target data, then data

2-6 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Two: Understand the Source Data

re-engineering will be less complex. The degree of difference


contributes to your project’s complexity.
Most organizations think they know what data they have. But if you
analyzed your data to determine how complete it is, how much of the
information is duplicated, and what types of anomalies exist within
each data field, you might be surprised. Over time, data integrity
weakens. The contents of fields stray from their original intent. The
label may say Name, but the field may also contain a title, or a Social
Security number, or a status, such as Deceased. This is all useful
information, but not if you can’t locate it.
Let’s take a closer look at the workflow for Phase Two.

Figure 2-2
Phase Two: Understand the Nature and Content of the Source Data

Step One: Prepare for QualityStage


Preparing for working in QualityStage entails:
• Having general knowledge about the information in the source
data.

QualityStage Designer User Guide 2-7


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Two: Understand the Source Data

• Knowing the file format of the source data.


• Preparing the data for working in QualityStage.

General Knowledge About the Source Data


You should know whether you’re dealing with customer information or
product information. You should also know the granularity of the
data, for example whether there are free-form fields and what
information is in them, or whether customer contact names use more
than one field (such as first name and last name).

File Format of Source Data


When you start a data re-engineering project, you don’t need to know
how the source data was collected, or the history of the source data.
But you do need to know its format and have a sense of its contents.
Does the data contain names and addresses, or does it contain
information about products? You’ll develop business rules for use
throughout the re-engineering process based on the data structure
and content.
The incoming data may come from a variety of sources, and/or it may
be in a variety of formats. QualityStage needs to know the structure of
the data — the data record definitions or the order and definition of
the parameters comprising each data object.
Your task is to create a flat data file that can be imported into
QualityStage. You need to understand the original file format, and
convert it, if necessary, to a format QualityStage supports.
For example:
• How many fields in a record.
• What label is used to identify each field.
• How long is each field.
• What type of data does the field contain (that is, numbers,
free-form text).

2-8 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Two: Understand the Source Data

Suppose you have a data file containing customer contact information,


where each record has the following fields:
• Name
• ID
• Address
• Phone
• Date Last Contacted
• Order History
• Outstanding Balance
Your application requires processing four of these fields:

Field Starting Field


Label Position Length Field Contains
Name 1 20 Free-form text (customer name)
ID 40 10 Numeric (customer ID number)
Address 60 50 Free-form text (customer address)
Phone 120 10 Numeric (customer phone number)

By extracting this type of information about the source data, you set
the phase for importing the data into QualityStage. You will need to
define these four fields for QualityStage.

Note: In this example, the fields are not sequential; there are gaps in
the starting position. You do not have to extract all fields from a
record, just the ones you want. In this case, four fields are being
extracted, but the original record contains many more that are
not being dealt with at this time.

QualityStage Designer User Guide 2-9


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Two: Understand the Source Data

Preparing Data for QualityStage


Before you start a data re-engineering project using QualityStage, you
must prepare the data for use by QualityStage. This is a two-step
process:

1. Working from the source data, create a flat data file and a file
definition (indicating metadata labels, field lengths, and starting
positions for each field).
2. Tell QualityStage where to find the data file.
Now you’re ready to investigate the data using QualityStage.

Step Two: Investigate the Source Data


Investigating helps you understand the quality of the source data and
clarify the direction of succeeding phases of the workflow. In addition,
it indicates the degree of processing you will need to create the target
re-engineered data.
By investigating data, you:
• Gain a better understanding of the quality of the data.
• Identify problem areas, such as blanks, errors, or formatting
issues.
• Prove or disprove any assumptions you may have had about the
data.
• Learn enough about the data to help you establish business rules
at the data level that you can use in the design and development of
your re-engineering application.
Investigating data identifies errors and validates the contents of fields
in a data file. The ability to investigate data is unique to QualityStage,
which looks at each record, field by field. You decide which fields to
investigate and which rule set to use (such as name for name data).
This process produces input data for Phase Three, where you build
your data re-engineering application.

2-10 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Two: Understand the Source Data

Investigating data with QualityStage involves using the Investigation


phase to perform one or more of the following functions on your data:
• Organizing
• Parsing
• Classifying
• Analyzing patterns

Organizing Source Data


As part of this step, you need to decide how to organize the records
being investigated. For example, do you want all the records in one
file? What criteria do you use for deciding which records go together in
the same file?
When you organize data, you often add a unique identifier or record
key to each data record. This identifier allows QualityStage to track
individual records or instances of data through a project. You can also
add a source ID to track if records are coming from multiple systems.

Parsing Source Data


QualityStage parses fields on words or characters to identify or isolate
the data within each field. QualityStage parses:
• single-domain fields, which are fields that contain one data entity.
• multiple-domain or free-form data fields, which are fields that
contain more than one data domain.
A domain is a type of information where the data describes a specific
entity, such as name, street address, or Social Security number.
When parsing free-form or multiple domain data, QualityStage breaks
the data down to a finite level, such as a word or token. Eventually
QualityStage uses these tokens for matching by analyzing the
meaning of the data and classifying the data.

QualityStage Designer User Guide 2-11


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Two: Understand the Source Data

Classifying Source Data


In this step, you determine which rule sets QualityStage uses to
classify the data. QualityStage classifies tokens and assigns, in
general, one tag per token. Eventually these tokens are used by
QualityStage to standardize records.
QualityStage classifies tokens based on the pattern processes you
defined for the Investigation phase. Once QualityStage classifies data,
it identifies key or common tokens based on the frequency with which
they appear in the data.
The collection of tokens produces a pattern that represents a single
data record.

Analyzing Patterns in Source Data


You analyze the patterns to understand and verify your data content.
After classifying the data, you check whether there is any unhandled
data (data classified as unknown). You need to decide what rule sets to
use, or whether you need to customize a rule set to ensure all data is
handled.

Step Three: Evaluate the Results and Redefine the Project


Although evaluating your results is considered the final step of the
data re-engineering process, in a well-designed and developed
application you are evaluating each step of the way. The results of
each step may affect the direction you take in the following step.
Suppose that during the Investigation phase, when you learn more
about your data, you find that only two-thirds of your customer
records contain Social Security numbers. You can use this information
to tailor your investigation process—you may decide to re-investigate
the source data to determine other fields of interest.
At the end of your investigation process, your evaluation may reveal
that you need to change your business rules that rely on Social
Security numbers. Or you may decide to redefine the process of
collecting data, so that in the future you can ensure that this

2-12 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Three: Design and Develop the Re-engineering Application

important information is gathered and stored. In either case, you


could repeat the investigation step again.
Think of the evaluation phase as a stepping stone to the next level,
whether that’s the next phase of your re-engineering process, or the
redefinition of your data collection process.

Phase Three: Design and Develop the Re-engineering


Application
Once you understand your business goals and your data, you’re ready
to design, develop, and execute your data re-engineering application.
Let’s take a closer look at the workflow for Phase Three.

Figure 2-3 Phase Three: Design and Develop the Data Re-engineering
Application

The data re-engineering application in QualityStage involves one or


more steps, including:
• Step One: Conditioning (standardizing) data
• Step Two: Matching data
• Step Three: Identifying surviving data and performing formatting
QualityStage provides sophisticated tools that make it easy for you to
complete these steps.

QualityStage Designer User Guide 2-13


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Three: Design and Develop the Re-engineering Application

Step One: Conditioning the Source Data


Conditioning data ensures that the source data is internally
consistent, that is, each type of data has the same type of content and
format. Using internally consistent data is important when you match
data in the next step. If your data is already in a format that allows for
good matching, there may be no need to significantly change the data
and you may choose to skip this step.
For example, you need to condition the source data if you want all
designations for the word Street to be the same (such as ST.), all ZIP
codes to conform to either a five-digit or nine-digit code, all states to
use the two-character abbreviation, and all dates to be in the same
format, such as yyyy/mm/dd.

About Conditioning
Conditioning data involves moving free-form data into fixed fields and
manipulating data to conform to standard conventions. This process
identifies and corrects invalid values, standardizes spelling formats
and abbreviations, and validates the format and content of the data.
QualityStage uses the data classification generated during the
Investigation process to condition and standardize the data.

Decisions You Make


In this step, you use the Standardize stage to condition data. Before
conditioning data, you need to consider the following factors:
• Does the data cross domains (for example, the name data spills into
the address data field)? If so, you must separate the domains (the
name from the address data).
• Is there international data? If so, you may need to run different
processes based on the country.
• Are you standardizing names? If so, do you have individual names
or company names, or both? You must pass the relevant name
fields to a names rule set.
• Are you standardizing addresses? If so, you must pass the relevant
address fields to an address rule set.

2-14 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Three: Design and Develop the Re-engineering Application

• Are you standardizing city/state/zip? If so, you must pass the


relevant place/area fields to a place/area rule set.
• Are there other fields to standardize? If so, you must create a
relevant standardization rule set for those fields.
Using the QualityStage interface, you select the relevant fields and
rule sets to apply to those fields. QualityStage uses the same pre-built
tables, or rule sets, to standardize the data that it used to investigate
the data. You can run these rules out of the box or customize them to
handle the unique situations dictated by your business rules and
specific data needs.
Using the Standardize stage, you complete these steps:

1. Define your input file.


2. Specify the criteria on which to standardize the input data.
3. Define the output.

Step Two: Matching the Data


Once the data is conditioned, you’re ready for matching. You match
data to identify either duplicates or cross-references to other files.
Matching identifies all records in one file (the input file) that
correspond to similar records (such as a person, household, address,
event) in another file (the reference file). Matching also identifies
duplicate records in one file and builds relationships between records
in multiple files. Relationships are defined by business rules at the
data level.
With QualityStage, you use the Match stage to match data. This
procedure calculates a score based on the probability of a match
between two records. The scoring involves measuring how close
various records are in agreement, using thresholds. The thresholds,
which you set, define how acceptable a score must be in order to be
considered a match.
Using the Match stage, you complete three steps:

1. Define your input file.

QualityStage Designer User Guide 2-15


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Three: Design and Develop the Re-engineering Application

2. Specify how you want input records to match. You can indicate
which fields are important, how to group records, and which fields
to use for weights and penalties.
3. Define the output.
Depending on the source data, you can perform multiple matching
passes. You can also decide whether each pass is independent or
dependent. With a dependent pass, you can choose to exclude matches
from a previous pass in the succeeding pass.
Your business rules determine these criteria decisions. And your data
determines how many passes you may need and how you need to
group the data. By evaluating the results of the previous phases, you
can determine the appropriate matching strategy for your application.

Example of Matching Data


Let’s say you are working with name and address data and you are
trying to identify duplicate records.

1. You’ve defined your input file.


2. Specify how you want input records to match. Your data level
business rules define what you mean by duplicate record.
• Use additional fields to refine the match criteria. For instance,
records with the same name at the same address are
considered duplications. If your records contain additional
information, such as date of birth and Social Security number,
you can use those fields to refine your matching criteria.
• You want to match on more than one field and you want to do
this iteratively. For instance, the first time QualityStage will
group on Social Security number and the second time it will
use the ZIP code and date of birth fields.
• You specify that each iteration is dependent on the previous
iteration. For example, if during the first matching attempt
QualityStage groups on Social Security number, the second
time it will not include those records that matched during the
first iteration.

2-16 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Three: Design and Develop the Re-engineering Application

Step Three: Determining Surviving Records and Formatting


In this step of the re-engineering workflow, you identify which records
(or fields of a set of duplicate records) from the match data survive and
become available for formatting, loading, or reporting.
In this step, when you have duplicate records, you must decide:
• To keep all the duplicates.
• To keep only one record that contains all the information that is in
the duplicates.
With QualityStage, you use the Survive stage to resolve conflicts with
records that pertain to one entity. In addition, it lets you optionally
create a cross-reference table to link all surviving records to the legacy
source.

Keeping All Duplicate Records


Depending on the business processes that use this data, and your
company’s business rules and data quality management standards,
you may need to keep all duplicate records. For example, legal reasons
may dictate that you keep all the original source data, even if it
contains duplicates.
If you choose to keep all duplicate records, you can either:
• Link them together with a common key.
• Standardize them so that they look the same, and then link them
together.

Keeping Only One Record


If you choose to keep one record, you can either:
• Select one record.
• Combine the data from multiple records to form a single record.
In either case, you keep what is known as the best record. Best is
defined by your data quality business rules. You can select the one
record to survive based on the following criteria:
• Chronology (for example, choose the most recent record).

QualityStage Designer User Guide 2-17


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Four: Evaluating Results

• Source (depends on the system of record).


• Frequency (for example, the same value appears in the same field
in multiple records).
• Most complete (for example, choose a record with full names as
opposed to a record with only initials).
Your criteria can be to take the “best” fields to create the new record.
Again, “best” can be the most complete, or most recent, or whatever
your business rules dictate.

Formatting
The formatting part of the consolidating activity involves defining the
output. For example, you can
• Define the order of fields in an output record.
• Create initial database load files or transaction input records.
• Create an initial production job stream.
The actual results of the formatting task depend on your goals, but
some possibilities are:
• Creating a file for each table.
• Creating from or to cross-reference tables.
• Creating data exception reports.

Phase Four: Evaluating Results


In a well-designed data re-engineering application, you evaluate the
results of each phase as you complete it. The results of one step may
impact the direction you take in the next step. This is a process of fine
tuning a data re-engineering application to achieve the highest
quality data.
At the end of your re-engineering application, you can evaluate the
entire process. What did you learn about your data? What did you
learn about your re-engineering process? Or about your data collection
process? Your evaluation may help you make changes to your next

2-18 QualityStage Designer User Guide


THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Four: Evaluating Results

data re-engineering project, help you apply the QualityStage jobs, or


help your company make changes to its business rules or even to its
business goals.
Evaluating the results of your re-engineering process can help your
organization maintain data management and ensure that corporate
data supports the company’s goals.

QualityStage Designer User Guide 2-19


2 THE WORKFLOW FOR CREATING RE-ENGINEERED DATA
Phase Four: Evaluating Results

2-20 QualityStage Designer User Guide


3

Using the QualityStage


Development Environment

This chapter describes the QualityStage data re-engineering


environment. Specifically, it describes how to:
• Install QualityStage Designer
• Start QualityStage Designer
• Use the QualityStage main window
• Set QualityStage Designer options

Sample projects QualityStage provides three sample projects and the data files used
and data files with these projects. This chapter refers to these sample projects,
which you can use to learn how to work in QualityStage. When you
install the QualityStage Designer, QualityStage creates a directory,
DATA_Samples, in the directory where you installed QualityStage.

Note: For the UNIX server version, the installation CD includes a


samples.tar file that contains these sample files. For the OS/390
server version, these files are installed automatically in the data
repository.

QualityStage Designer User Guide 3-1


3 USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Installing QualityStage Designer

Installing QualityStage Designer


Install QualityStage Designer on a Windows workstation.

Important: You must have Windows administrator privileges on your


machine before you can install QualityStage Designer. Failure to
do so prevents critical files from being installed into the system
registry.

To install:

1. Insert the CD in the CD-ROM drive. If the set-up directory doesn’t


appear, browse the CD-ROM for it.
2. Run Setup.exe from your CD-ROM drive and follow the
instructions.

Note: You are prompted to select the components of QualityStage


Designer you want to install.

WAVES Rules The WAVES Rules are required for:


• The Multinational Standardize stage
• The QualityStage WAVES stage
You should clear the Install WAVES Rules check box only if you do
not plan to use either QualityStage’s Multinational
standardization stage or the QualityStage WAVES stage.

Starting QualityStage Designer


To start QualityStage Designer:
1. Do one of the following:
• From the Start menu choose Ascential ➤ QualityStage
Designer.

• Double-click the QualityStage icon on your desktop.

3-2 QualityStage Designer User Guide


USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Starting QualityStage Designer

When you first start QualityStage Designer, it creates the


QualityStage Designer database by copying the sample database.
The following window appears.

2. Click OK.
The QualityStage main window appears.

The sample database contains a sample project named QUALITY.

QualityStage Designer User Guide 3-3


3 USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Using the QualityStage Main Window

Using the QualityStage Main Window


The QualityStage main window has three panes:
• Menu and toolbar pane. The menus give access to the QualityStage
wizards and dialog boxes. The toolbar gives access to commonly
used commands.
• Left pane. The left pane displays the contents of the currently
active QualityStage repository in a hierarchical tree format. The
left pane lets you navigate QualityStage projects and their
components in the QualityStage repository.
• Right pane. The right pane lists the contents of whatever
QualityStage component is currently selected in the left pane
(QualityStage datafile definitions, stages, jobs, etc.).
The QualityStage main window gives you access to all QualityStage
features.
You can resize the main window and its columns. When you open
QualityStage again, it retains the size and positions you set.
Use the main window to manage your projects. With this window, you
can access:
• QualityStage projects
• QualityStage jobs
• QualityStage stages
• QualityStage data file definitions (including all defined fields)

Using the QualityStage Menu Bar


The menu bar on the QualityStage main window has five menus:
• File menu
• Edit menu
• View menu
• Rules menu
• Help menu

3-4 QualityStage Designer User Guide


USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Using the QualityStage Main Window

Note: The availability of menu options varies, depending on what you


select in the left or right pane. Options that are not available for
the selected item are greyed out.

File Menu
The File menu lists the following commands:
• Open Repository. Use this command to open another
QualityStage repository. (When you launch QualityStage
Designer, QualityStage always opens the most recently used
repository.)
• Import. Use this command to import:
– A project from an IMF file
– Datafile definitions:
• From a COBOL copybook
• From an ODBC file definition
• From Visual Warehouse
• Via MetaBrokers
See “Importing Projects” on page 4-8 for information about
importing projects.
• Export. Use this command to export:
– A project to an IMF file
– A datafile definition via MetaBrokers
– A job to Visual Warehouse
See “Exporting Projects” on page 4-4 for information about
exporting projects.
• Run profiles. Use this command to create, modify, or delete run
profiles. See Chapter 5, “Setting Up Run Profiles” for information
about setting up run profiles.
• Reports. Use this command to set up and run QualityStage
reports. See Chapter 15, “Working with QualityStage Reports”, for
information about QualityStage reports.

QualityStage Designer User Guide 3-5


3 USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Using the QualityStage Main Window

• Designer Options. Use this command to specify:


– QualityStage Designer options
– Data Warehouse Center options
See “Setting QualityStage Designer Options” on page 3-12 for
information about QualityStage Designer options. See the
QualityStage Guide to the Data Warehouse Center Interface for
information about the Data Warehouse Center.
• Exit. Use this command to exit QualityStage Designer.

Edit Menu
The Edit menu lists the following commands:
• Cut. Use this command to delete an item selected in the right pane.
• Copy. Use this command to copy the selected item to the clipboard.
• Paste. Use this command to paste the item currently in the
clipboard.

View Menu
The View menu lists the following commands:
• Tool Bar. Use this command to display or hide the QualityStage
tool bar.
• Status Bar. Use this command to display or hide the status bar.
• Ascential Banner. Use this command to display or hide the
Ascential banner just below the title bar.

Rules Menu
The Rules menu lists the following commands:
• Standardize Rules Management. Use this command to create,
modify, or delete Standardize stage rule sets. See “Managing the
Rule Sets and Files” on page 10-32 for information about
Standardize rule sets.

3-6 QualityStage Designer User Guide


USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Using the QualityStage Main Window

• Standardize Overrides. Use this command to modify


Standardize rule set behavior using rule override tables. See
Appendix E, “Customizing and Testing Rule Sets”, for information
about overriding rule sets.
• Standardize Rules Analyzer. Use this command to test
Standardize stage rule sets. See “Testing Standardization Rule
Sets” on page E-34 for information about the QualityStage Rules
Analyzer.
• WAVES/Multinational Standardize Overrides. Use this
command to modify WAVES rule set behavior using rule override
tables. See Appendix E, “Customizing and Testing Rule Sets”, for
information about overriding rule sets.

Help Menu
The Help menu lists the following commands:
• Online Help. Use this command to display the QualityStage
online help system.
• User Guide. Use this command to display the QualityStage
Designer User Guide in Acrobat Reader.
• Stages Guide. Use this command to display the QualityStage
Stages Reference Guide in Acrobat Reader.
• Getting Started. Use this command to display the QualityStage:
Getting Started guide in Acrobat Reader.

Using the Left Pane of the QualityStage Main Window


The left pane uses the familiar Windows Explorer model to display the
contents of the current QualityStage repository. The contents are
displayed in a hierarchical tree format.
When you expand the repository folder, all currently defined
QualityStage projects are listed as folders.

QualityStage Designer User Guide 3-7


3 USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Using the QualityStage Main Window

Click the plus sign next to a project folder to expand its contents. Each
project folder contains the following three subfolders:
• Datafile Definitions. When you select a DataFile Definitions
folder, the right pane displays all datafile definitions defined for
this project. Click the plus sign next to this folder to expand the list
of datafile definitions in the left pane.
– When you select a datafile definition in the left pane, the right
pane displays all fields defined for this datafile.
• Stages. When you select a Stages folder, the right pane displays all
stages defined for this project.
• Jobs. When you select a Jobs folder, the right pane displays all jobs
defined for this project. Click the plus sign next to this folder to
expand the list of jobs in the left pane.

Using the Right Pane of the QualityStage Main Window


The right pane uses the familiar Windows Explorer structure to list
the contents of whatever item you select in the left pane:

In the left pane, if you select … The right pane lists …


The QualityStage repository All projects in the repository
A project The Datafile Definitions, Stages, and
Jobs folders for the project
A Datafile Definition folder All datafile definitions for the project
A Stages folder All stages defined for the project
A Jobs folder All jobs defined for the project
A datafile definition All fields defined for the datafile

Drag and drop You can use drag-and-drop operations to copy or move items listed in
the right pane to appropriate folders that are listed in the left pane.
To drag and drop multiple adjacent items, hold down the SHIFT key
while you click items to select them.

3-8 QualityStage Designer User Guide


USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Using the QualityStage Main Window

To drag and drop multiple items that are not adjacent, hold down the
CTRL key while you click items to select them.
The following table lists the available drag-and-drop operations:

By dragging and dropping, To …


you can copy or move …
a datafile definition a Project folder
a Datafile Definitions folder
a datafield definition a datafile definition
a stage a Project folder
a Stages folder
a job in the same project as the
stage
a job a Project folder
a Jobs folder

Note: You cannot directly copy or move a stage from one project to a job
in another project. To do this, first copy or move the stage from
the source project to the target project, and then add the stage
you copied or moved to a job in the target project.

Right-clicking In the right pane, you can:


• Right-click anywhere on the pane, or
• Right-click on a particular item
Right-clicking displays a shortcut menu with commands appropriate
to the item you right-clicked on.

Deleting items In the right pane, you can delete datafile definitions, datafield
definitions, stages, jobs, and projects.

QualityStage Designer User Guide 3-9


3 USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Using QualityStage Dialog Boxes

Using the QualityStage Toolbar


Use the QualityStage toolbar to:

• Create new:
– Projects
– Datafile definitions
– Datafield definitions
– Stages
– Jobs

• Cut, copy, and paste items listed on the right pane


of the QualityStage main window.
• Change the display characteristics of the right pane. You can
display:

– Large icons

– Small icons

– Details

• Run the job selected on the right pane of the QualityStage


main window.

Using QualityStage Dialog Boxes


QualityStage provides several different ways to select or move items
within dialog boxes, and to locate additional menus.

3-10 QualityStage Designer User Guide


USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Using QualityStage Dialog Boxes

Selecting Items
There are two methods of selecting multiple items in a list:
• To select multiple nonadjacent items, hold down the CTRL key and
click the items to select them.
• To select multiple adjacent items, hold down the SHIFT key and
click the items to select them.

Moving Items
The drag and drop method of moving items can be used for some lists
in the following ways:
• You can drag and drop any item in a list to change its position in
the list.
• You can drag and drop from one list to another any item that is
highlighted when selected.
When you move an item, the item is highlighted in the list, and the
two items between which you are placing it appear in bold text.
The icon indicates that the item can be moved. If the move icon
does not appear, you cannot alter the position of the item in the list
while in that dialog box.
In addition to using the drag and drop method in some dialog boxes,
you can also use the Move Up and Move Down buttons to move items.

Additional Menus
In addition to the menus in the menu bar, there are several menus
accessible from within the dialog boxes themselves. Right-click an
item to bring up any additional menus specific to that item.

QualityStage Designer User Guide 3-11


3 USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Setting QualityStage Designer Options

Browsing
Some dialog boxes contain locations for a specific file or directory. You
can browse for the correct location by using the browse button on
the right side of the entry. Here is an example of the Designer Options
dialog box in which the browse button for Local Working Directory is
selected:

The Browse for Folder dialog box appears.

Setting QualityStage Designer Options


QualityStage Designer provides a Windows-based user interface for
defining and building your QualityStage jobs.
The files and directories you work with have a default location. You
can change these locations by setting the Designer options.
These options affect your QualityStage Designer environment, which
include:
• Changing the location of your local working directory
• Changing the location of your Standardize stage files

3-12 QualityStage Designer User Guide


USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Setting QualityStage Designer Options

• Specifying a default import directory


• Specifying QualityStage Interface to Data Warehouse Center
directories for import and extensions
• Specifying a preferred editor

Local Working Directory


Each time you deploy a job, QualityStage generates files containing
instructions for building the script or the JCL to be executed on the
server. To store these files, QualityStage creates a directory for each
project in the local working directory.
This directory is located on the QualityStage Designer host. By
default, QualityStage uses the directory where you installed
QualityStage Designer as the local working directory. You can change
this directory.

Standardize Process Definition Directory


By default, the Standardize stage uses process definitions (Rule Sets)
that are installed with QualityStage Designer. If you want to use Rule
Sets from another location, enter the full path to that directory.

Default Import Directory


If you intend to import QualityStage projects developed using earlier
releases of QualityStage (or INTEGRITY), you can define a default
directory for the Interchange Metadata Format (IMF) files. See
“Importing Projects” on page 4-8 for more information.

QualityStage Designer User Guide 3-13


3 USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Setting QualityStage Designer Options

Preferred Editor
By default, QualityStage uses Notepad for its text file editor. If you
prefer to use another editor, enter the path to the editor of your choice.

Data Warehouse Center


For information about Data Warehouse Center options, see the
QualityStage Guide to the Data Warehouse Center Interface. See also
“Using the Interface to Data Warehouse Center” on page 4-29.

How to Set Designer Options


To set QualityStage Designer options:

1. Select File ➤ Designer Options.

3-14 QualityStage Designer User Guide


USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Setting QualityStage Designer Options

The Designer Options dialog box appears and displays the General
tab:

2. Under Local Working Directory, enter the full path for an


alternative directory for creating and storing run descriptions and
control members on the QualityStage Designer host.

Note: This is optional and not a project directory. Unless you are
very short on disk space, do not change this directory.

3. Under Standardize Process Definition Directory, enter the full


path to this directory; for example:
c:\Ascential\QualityStageDesigner70\RULES
4. Under Default Import Directory, enter the full path to the
directory in which you maintain IMF files.

QualityStage Designer User Guide 3-15


3 USING THE QUALITYSTAGE DEVELOPMENT ENVIRONMENT
Setting QualityStage Designer Options

5. Under Preferred Editor, enter the file name of the editor you want
to use with QualityStage. You can:
• Enter the full path to the editor executable file.
• Enter the file name of any editor executable file located in a
directory included in your PATH environment variable.

• Use to browse for an editor executable file.


If you leave this field empty, QualityStage uses Notepad as its
default editor.
6. By default, QualityStage saves passwords to ODBC data sources
in the QualityStage repository. If you do not want your ODBC
password saved, clear the Save ODBC Passwords check box.
For information about the Data Warehouse Center options, see “Using
the Interface to Data Warehouse Center” on page 4-29

3-16 QualityStage Designer User Guide


4

Working with Projects

This chapter describes how to:


• Create QualityStage projects
• Export and import QualityStage projects
• Import a COBOL copybook
• Create datafile definitions
• Use extended features such as:
– Interface to Data Warehouse Center
– Licensed stages such as QualityStage CASS, QualityStage
Z4Change, QualityStage SERP, and QualityStage WAVES

Creating QualityStage Projects


In Chapter 2 we talked about designing and developing a data
re-engineering application. In QualityStage, a data re-engineering
application is called a project. Projects are the means by which you
organize the QualityStage work environment.
You define data files and stages, and you build jobs within a specific
project. QualityStage uses these projects to create and store files.
When creating a project, you can either add a new project or copy an
existing project. If you copy an existing project, QualityStage copies

QualityStage Designer User Guide 4-1


4 WORKING WITH PROJECTS
Creating QualityStage Projects

the datafile definitions, stages, and jobs. You can then edit these
datafile definitions, stages, and jobs to create your own project.
If you are sharing a data repository with other QualityStage Designer
clients, only one user can access a project at a time, but different users
can access different projects at the same time.

Adding a Project
To add a project:
1. Do one of the following:

• On the Toolbar, click ➤ Project.


• On the QualityStage main window:
a. On the left pane, select the QualityStage repository.
b. On the right pane, right-click anywhere, and then click
New Project.
The Add a New Project dialog box appears.

2. Under Name, enter up to eight alphanumeric characters for the


project name.
3. Under Description, enter up to 40 alphanumeric characters for the
project description.
4. Click OK.

4-2 QualityStage Designer User Guide


WORKING WITH PROJECTS
Creating QualityStage Projects

Copying a Project
To copy an existing project:

1. On the right pane of the QualityStage main window, select the


project you want to copy.
2. Do one of the following:

• On the Toolbar, click , and then click .


• From the Edit menu, click Copy, and then click Paste.
• On the right pane, right-click the project you want to copy,
click Copy, and then right-click anywhere on the right pane
and click Paste.
The Provide Name for New Project dialog box appears.

3. Under Name, enter up to eight alphanumeric characters.


4. Under Description, enter up to 40 characters.
5. Click OK.

Deleting Projects
To delete a project:

1. On the left pane of the QualityStage main window, select


PROJECTS.

QualityStage Designer User Guide 4-3


4 WORKING WITH PROJECTS
Exporting Projects

2. On the right pane of the QualityStage main window, right-click


the project you want to delete, and then click Delete.

Exporting Projects
When you want another user to work with the same project or one
created using a previous version of QualityStage, you can export the
project so that it is loaded onto the other user’s client machine. You
can export projects from one QualityStage Designer client and import
them as projects into another QualityStage Designer client.
During an export, QualityStage creates a single Interchange
Metadata Format (IMF) file of the project. You can then move this file
to any QualityStage Designer host running the same version or a later
version.
To export a project:

1. On the right pane of the QualityStage main window, select the


project you want to export.
2. Do one of the following:
• Select File ➤ Export ➤ Project ➤ To IMF.
• Right-click the selected project on the right pane, and then
click Export.

4-4 QualityStage Designer User Guide


WORKING WITH PROJECTS
Exporting Projects

The Select Output File dialog box appears.

3. Using the standard navigation, select the desired destination for


the IMF file.
4. Next to File name enter a filename.
5. Click Save.
A window appears displaying a progress bar indicating the status
of the export.
When the export finishes successfully, the following message box
appears.

6. Click OK.

QualityStage Designer User Guide 4-5


4 WORKING WITH PROJECTS
Exporting Datafile Definitions

Exporting Datafile Definitions


You can export datafile definitions from a QualityStage project:
• To Ascential MetaStage
• To other data warehousing tools via Ascential MetaBrokers
For information about MetaStage, see the MetaStage User’s Guide.
For information about MetaBrokers, see the MetaBroker technical
bulletin for the MetaBroker you are using. Each MetaBroker has its
own technical bulletin.

Exporting Datafile Definitions via MetaBrokers


To export datafile definitions in a QualityStage project through a
MetaBroker to another data warehousing tool:

1. On the QualityStage main window, choose File ➤ Export ➤


Datafile Definition ➤ Via MetaBrokers.
The Select MetaBroker dialog box appears.
2. Select the MetaBroker you want to use, and then click OK.
The Parameter Selection dialog box appears.
3. Do the following:
a. Select the Verbose check box to make the Status dialog box
display the status of each converted datafile definition.
If you clear the Verbose check box, only a brief list of messages
appears.
In either case, a full set of messages is written to the log file.
b. Next to Log File, enter the path of the log file to create, or click
to browse for the file.
c. Click OK.
The Status dialog box appears. The MetaBroker decodes the
QualityStage datafile definitions and writes it to a temporary file
for export.

4-6 QualityStage Designer User Guide


WORKING WITH PROJECTS
Exporting Datafile Definitions

4. When decoding is complete, do one of the following:


• Click Select All to export all datafile definitions.
• Click Filter to open the Meta Data Selection dialog box. You
can filter out some of the datafile definitions. For detailed
instructions on filtering, click Help.
When you are done, click OK. The Parameter Selection dialog box
appears.
5. Enter the path of the file to which you want to export datafile
definitions, or use to browse for the file.
See the technical bulletin for the MetaBroker you are using for
detailed information about completing this dialog box. When you
finish entering parameters, click OK.
The Status dialog box appears and lists the progress of the export.
6. Click Finish after the export is complete.

Exporting Datafile Definitions to MetaStage


To export all datafile definitions in a QualityStage project to
MetaStage:

1. On the left pane of the QualityStage main window, select the


project whose data definitions you want to export.
2. Choose File ➤ Export ➤ Datafile Definition ➤ To MetaStage.
The MetaStage Attach dialog box appears.
3. Enter the required connection information to connect to a specific
directory, and then click Current.
A new import category, containing the entire contents of the
project as viewable through the QualityStage MetaBroker, is
created in the specified MetaStage directory. The import category
is named QualityStageProjectName–Timestamp (for example,
Oracle1–4/21/2003 2:24:14 PM).

QualityStage Designer User Guide 4-7


4 WORKING WITH PROJECTS
Importing Projects

Note: Because QualityStage truncates long names, identical metadata


objects imported into MetaStage from QualityStage and another
product may have different names.

Importing Projects
When you need to use a project created in an earlier version of
QualityStage (or INTEGRITY), or someone else’s project, you import
that project. You can import QualityStage projects that were exported
from another QualityStage Designer client.
To import a project, you need to create an Interchange Metadata
Format (IMF) file of the project and make it available on the
QualityStage Designer client host. QualityStage uses the IMF file and
re-creates the data file definitions, the jobs, and the stage definitions.
If you are importing a project from an MVS/UNIX host, refer to the
QualityStage UNIX, Linux, and Windows Server Guide for instructions for
how to create the IMF file.
To import a project converted to an IMF file:

1. On the left pane of the QualityStage main window, select the


QualityStage repository.
2. Do one of the following:
• Select File ➤ Import ➤ Project ➤ From IMF.
• Right-click anywhere on the right pane, and then click Import.
The Import Project dialog box appears.

4-8 QualityStage Designer User Guide


WORKING WITH PROJECTS
Importing a COBOL Copybook

3. Under Name, enter up to eight alphanumeric characters.


4. Under Description, enter up to 40 alphanumeric characters.
5. Click OK.
The Select Import File dialog box appears. If you enter a default
import directory in the Designer Options dialog box, the dialog box
opens that directory.

6. Using the standard navigation, select the desired IMF file, and
then click Open.
Your imported project appears in the project list on the right pane
of the QualityStage main window.

Importing a COBOL Copybook


A COBOL Copybook is a file definition. You can import one to use as
your datafile definition in QualityStage.
By importing a COBOL Copybook, you can quickly create a file
definition, including all necessary fields. This is useful when you have
a lot of files that you want to load into QualityStage.

QualityStage Designer User Guide 4-9


4 WORKING WITH PROJECTS
Importing a COBOL Copybook

You can also use a COBOL Copybook to ensure that your file
definitions are consistent and accurate; by importing the definitions,
you can avoid manual entry and any possible keying errors.
When QualityStage imports a COBOL Copybook, it automatically
creates a new project. You can then add the newly imported datafiles
to an existing project if you want to.
To import a COBOL Copybook:

1. On the QualityStage main window, select a project.


2. Select File ➤ Import ➤ Datafile Definition ➤ From Cobol
Copybook.
The Import COBOL Copybook dialog box appears:

3. Enter the name and description of the new project.


You must remove the sequence numbers in the COBOL file before
you can import the file. By default, QualityStage removes these
numbers. If they have already been removed, you can clear the
Strip Sequence Numbers check box.
4. Click OK.

4-10 QualityStage Designer User Guide


WORKING WITH PROJECTS
Importing a COBOL Copybook

The Select COBOL Copybook File dialog box appears:

5. Select the COBOL Copybook to import, and then click Open.


The Specify Datafile Name dialog box appears:

6. Enter the name and description of the data file that the COBOL
Copybook will define.
7. Click OK.

QualityStage Designer User Guide 4-11


4 WORKING WITH PROJECTS
Importing Datafile Definitions

Importing Datafile Definitions


You can import datafile definitions into a QualityStage project from:
• Ascential MetaStage
• Other data warehousing tools via Ascential MetaBrokers
For information about MetaStage, see the MetaStage User’s Guide.
For information about MetaBrokers, see the MetaBroker technical
bulletin for the MetaBroker you are using. Each MetaBroker has its
own technical bulletin.

File names, field When importing datafile definitions via a MetaBroker or from
names, and field MetaStage, the following conditions apply:
lengths • If an imported file name is longer than 8 characters, QualityStage
Designer truncates the name to 8 characters and puts the full file
name to the file description.
• If an imported field name is longer than 7 characters, QualityStage
Designer truncates the name to 7 characters and puts the full field
name to the field description.
• If a file name or field name contains characters that QualityStage
cannot use (for example _ (underscore)), QualityStage Designer
replaces the name with BADNAMEn and puts the original name to
the file or field description.
• If there are no field lengths defined in the import file, QualityStage
Designer sets the field length to 0.
When you re-export such datafile definitions via a MetaBroker, the
MetaBroker uses the original name.

Importing Datafile Definitions via MetaBrokers


To import data definitions through a MetaBroker into QualityStage:

1. On the QualityStage main window, choose File ➤ Import ➤


Datafile Definition ➤ Via MetaBrokers.
The Import Data Definition via MetaBroker dialog box appears.

4-12 QualityStage Designer User Guide


WORKING WITH PROJECTS
Importing Datafile Definitions

2. Enter the name of the project you want to create in QualityStage,


and then click OK.
The Select MetaBroker dialog box appears.
3. Click the MetaBroker you want to use for importing, and then
click OK.
The Parameter Selection dialog box appears.

4. Enter the name of the file to import, or use to browse for the
file.
See the technical bulletin for the MetaBroker you are using for
detailed information about completing this dialog box. When you
finish entering parameters, click OK.
The Status dialog box appears. The MetaBroker decodes datafile
definitions and writes them to a temporary file.
5. Do one of the following:
• Click Select All to select all datafile definitions.
• Click Filter to open the Meta Data Selection dialog box. You
can filter out some of the datafile definitions. For detailed
instructions on filtering, click Help.
When you are done, click OK. The Parameter Selection dialog box
appears.
6. Do the following:
a. Select the Verbose check box to make the Status dialog box
display the status of each imported datafile definition.
b. Next to Log File, enter the path of the log file to create, or click
to browse for the file.
c. Click OK.
7. When the import is complete, click Finish.

QualityStage Designer User Guide 4-13


4 WORKING WITH PROJECTS
Working with Data Files

Importing Data Definitions from MetaStage


To import the contents of an existing MetaStage publication category
into QualityStage:

1. On the QualityStage main window, choose File ➤ Import ➤


Datafile Definition ➤ From MetaStage.
The Import Data Definition from MetaStage dialog box appears.
2. Enter the name of the project you want to create in QualityStage,
and then click OK.
The MetaStage Attach dialog box appears.
3. Enter the required connection information to connect to a specific
directory, and then click Current.
The Select Publication dialog box appears.
4. Select one or more publication categories to import, and then click
Select.
The data definitions in the publication categories are imported
into QualityStage.

Note: The import fails if a selected publication category contains


multiple project objects or metadata objects associated with
multiple projects.
Even if a publication category contains only part of a project (for
example, just the project object, or just a few of the record objects), the
contents of the entire project is imported into QualityStage, including
project object and all records and fields.

Working with Data Files


QualityStage requires a definition of the data files used by a job.
When you define a data file, you indicate the file’s name and provide a
description of the file.
For most jobs you must also define the fields and include the starting
position in the file and length in characters. The field descriptions are
also referred to as metadata.

4-14 QualityStage Designer User Guide


WORKING WITH PROJECTS
Working with Data Files

When you configure or build a job, you select files from a list of
available files to assign those files to the job. Therefore you must
define your files before you configure or build jobs.
All jobs require at least one input file, some require two, and most jobs
require one output (or results file), some require two. The Match stage
also requires files if you specify custom reports or extracts.

Input File Format


QualityStage can process only fixed-length format input files. In
general, input files must be fixed-length files with an end-of-record
terminating character, and they must contain alphanumeric data. If
your input file is in a variable-length format, you must convert it to
fixed length prior to submitting the file for processing.
QualityStage can convert some nonstandard file formats using the
Transfer stage (see Chapter 6,“Building Jobs”) to convert files with a
variable length format into a fixed-length format. Once the file is
fixed-length terminated, the rest of QualityStage’s stages can accept
the file.
For more information see the QualityStage Stages Reference Guide.

Important: File names must be eight characters or less, and must not contain
extensions such as .txt.

About Defining Fields


When you create a job, you define all the fields in a data file that are
used by the job. Based on the fields you define, QualityStage
calculates the record length of the data file. Therefore your datafile
definition must account for the record length. One method to do this is
to define one field with a starting position of the last column in the
record and a length of one.

QualityStage Designer User Guide 4-15


4 WORKING WITH PROJECTS
Working with Data Files

About the Results File


For some jobs the results file is very similar to, if not identical to, the
input file. In these cases, to create the results file you have only to
copy the input file and make any necessary modifications, such as
adding a field.

About Arrays
The Match stage provides the ability to compare arrays of fields. An
array can comprise any number of fields, including one. To use arrays,
you need to define them using the Arrayfields dialog box.
Arrays allows you to reduce the number of cross comparisons you
would have to define with Match. For example, if you have a first
name, a middle name, and a last name field that might appear in any
order (first name in last name field), arrays compare all names
without regard to order.

Creating a New Datafile Definition


To create a new datafile definition:
1. Do one of the following:
• On the left pane of the QualityStage main window, select a

project, and then on the Toolbar click ➤ Datafile


Definition.
• On the left pane of the QualityStage main window, select a
Datafile Definitions folder, and then right-click anywhere on
the right pane and click New File.

4-16 QualityStage Designer User Guide


WORKING WITH PROJECTS
Working with Data Files

The Add a New Datafile dialog box appears.

2. Under Name, enter eight alphanumeric characters using an


alphabetic character for the first.
3. Under Description, enter up to 40 alphanumeric characters.
4. Under Code Page select DEFAULT.

Note: If you are creating a datafile definition for a Transfer stage,


you can select a code page that corresponds to the format of
the data file you want to convert. For information about
defining datafiles for the Transfer stage, see Chapter 1 of the
QualityStage Stages Reference Guide.

5. From the Language/Locale list, select one of the following:


• English
• Japanese
6. Under File Format select one of the following:
• Fixed Length Terminated – Select this option for files
compatible with the UNIX or Windows standard QualityStage
data file format. These files should contain fixed-length
records with an end-of-line terminating character. This is the
default value.
• Variable Length Terminated – Select this option for files that
have variable length records and an end-of-line terminating

QualityStage Designer User Guide 4-17


4 WORKING WITH PROJECTS
Working with Data Files

character. If you select this option, you must do one of the


following:
– Use the Transfer stage to create a fixed-length output file
from the input file.
– Use the filescan utility to determine the maximum record
length in a file. filescan can also convert the file with vari-
able-length records to a file with fixed-length records.
For information about the Transfer stage and the filescan
utility, see the QualityStage Stages Reference Guide.
• Fixed Length Unterminated – Select this option for files
compatible with the MVS standard QualityStage data file
format. These files should contain fixed-length records and no
end-of-line terminating character.
You can select any file format to define either input or output files.
7. Click OK.

Copying a Datafile Definition


To create a data file definition by using an existing definition:

1. On the left pane of the QualityStage main window, select a


Datafile Definitions folder.
2. On the right pane, select the datafile definition you want to copy.
3. Do one of the following:

• On the Toolbar, click , and then click .


• On the right pane, right-click the datafile definition you want
to copy, click Copy, and then right-click anywhere on the right
pane and click Paste.

4-18 QualityStage Designer User Guide


WORKING WITH PROJECTS
Working with Data Files

The Select datafile name dialog box appears.

4. Enter up to eight alphanumeric characters for the name of the


new data file, using an alphabetic character for the first.
5. Click OK.
The new datafile is added to the file list.

Modifying a Datafile Definition


You can change the name, description, code page, locale, and file
format of an existing file.
To modify a datafile definition:
1. On the left pane of the QualityStage main window, select a
Datafile Definitions folder.
2. On the right pane, right-click the datafile definition you want to
modify, and then click Modify.

QualityStage Designer User Guide 4-19


4 WORKING WITH PROJECTS
Working with Data Files

The Modify Datafile dialog box appears.

3. Under Name, enter up to eight alphanumeric characters, using an


alphabetic character for the first.
4. Under Description, enter up to 40 alphanumeric characters.
5. Under Code Page select DEFAULT.

Note: If you are modifying a datafile definition for a Transfer stage,


you can select a code page that corresponds to the format of
the data file you want to convert. For information about
defining datafiles for the Transfer stage, see Chapter 1 of the
QualityStage Stages Reference Guide.

6. Under Language/Locale, select one of the following:


• English
• Japanese
7. Under File Format select one of the following:
• Fixed Length Terminated
• Variable Length Terminated
• Fixed Length Unterminated
8. Click OK.
The old file name is automatically replaced by the new file name
wherever it is used.

4-20 QualityStage Designer User Guide


WORKING WITH PROJECTS
Working with Datafields

Check that any jobs and stages using this file now select this file
name.

Working with Datafields


You can create, modify, and delete fields in a datafile in two ways:
• Using the QualityStage main window
• Using the Data Field wizard
The next three sections describe how to use the QualityStage main
window to create, modify, and delete datafield definitions.

Creating a Datafield Definition


To add a datafield definition:

1. On the left pane of the QualityStage main window, select a


datafile definition.
2. Do one of the following:

• On the Toolbar, click ➤ Datafield Definition.


• On the right pane, right-click anywhere, and then click New
Field.

QualityStage Designer User Guide 4-21


4 WORKING WITH PROJECTS
Working with Datafields

The Add a New Datafield dialog box appears.

3. Under Name, enter up to seven alphanumeric characters.


4. Under Start Position, enter the starting position for the field as a
whole number.
5. Under Length, enter the length of the field as a whole number.
6. Under Description, enter up to 40 alphanumeric characters to
provide a description of the field.
7. From the Missing Value list, select one of the following
conventions used when a record has missing data in the field:
• S for spaces.
• Z for zero or spaces.
• N for a negative number (such as –1).
• 9 for all nines (9999).
• X indicating no missing value. This is the default value.

Note: The missing value convention is used only by the Match


stage.

4-22 QualityStage Designer User Guide


WORKING WITH PROJECTS
Working with Datafields

8. To insert a field between existing fields or before the first field,


select the Shift all subsequent Fields check box.
QualityStage reformats the definitions by adding the length of the
inserted field to the starting position of the subsequent fields.

Note: This changes the Total Record Length.

When you are inserting a new field, if the starting position does
not correspond to the beginning of an existing field, an error
occurs. Click OK to return to the Datafield dialog box to modify
the Start Position value.
9. If you want the field to be tested by the Survive stage as an
integer, select Integer under Field Use Type.
You need to change a field to integer only if you are testing the
field by performing an arithmetic operation on it; for example,
adding or subtracting a value from an age.

Note: The field use type is used only by the Survive stage.

10. Under Field Data Type select the appropriate data type for your
field:
• Alphanumeric - This is the default value.
• Packed Decimal.
• Zoned Decimal.
• Binary Unsigned.
Any data type can be selected for either input or output files. All
numeric data types are right-justified when converted to
alphanumeric data types.
11. Do one of the following:
• Click Apply to add the datafield definition to the data file.
Information in the dialog box is cleared, letting you define
another datafield.
• Click OK to add the datafield definition to the data file and
exit the dialog box.

QualityStage Designer User Guide 4-23


4 WORKING WITH PROJECTS
Working with Datafields

• Click Cancel to exit the dialog box without adding the


datafield definition to the data file.
New fields appear in the fields list on the right pane.

Adding more fields To add another field:

1. Under Name, enter up to seven alphanumeric characters.


2. Under Start Position, press the plus key (+) on the numeric
keypad of your keyboard.
QualityStage calculates and enters the next available position as
the first column following the last field defined.
3. Repeat step 5 through step • of the previous procedure.

Modifying a Datafield Definition


To modify a datafield definition:

1. On the left pane of the QualityStage main window, select a


datafile definition.
2. On the right pane, right-click a datafield definition, and then click
Modify.

4-24 QualityStage Designer User Guide


WORKING WITH PROJECTS
Working with Datafields

The Modify a Datafield dialog box appears.

3. Make the appropriate changes.


4. Click OK.

Deleting a Datafield Definition


You can delete any datafield definition that is not used in a job. When
you delete a field among a group of fields, the following message box
appears asking if you want to reformat the datafile definition.

QualityStage Designer User Guide 4-25


4 WORKING WITH PROJECTS
Working with Datafields

• If you click Yes, QualityStage subtracts the length of the deleted


field from the starting position of the subsequent fields and from
the Total Record Length.
• If you click No, the starting position of all remaining fields and the
Total Record Length stays the same.

Note: The Total Record Length of the file is derived from your data
definitions. If you do not have a field defined that represents the
last column of the record, QualityStage may not correctly
calculate the Total Record Length of the file.

Using the Data Field Wizard


You can reach the Data Field wizard from any of the Stage wizards,
which you use when you are creating or modifying stages.

4-26 QualityStage Designer User Guide


WORKING WITH PROJECTS
Working with Datafields

Defining Arrays
To define an array:

1. On the left pane of the QualityStage main window, select a


Datafile Definitions folder.
2. On the right pane, right-click a datafile definition, and then click
Arrays.
The Arrayfields dialog box appears.

3. Under Available Fields, select the desired data field, and then

click or drag and drop the file from the Available Fields box
into the Fields in the Arrayfield box.
The data field definition appears under Fields in the Arrayfield.

QualityStage Designer User Guide 4-27


4 WORKING WITH PROJECTS
Using QualityStage Extensions

To remove a field from an array, select the field and click .


4. Repeat step 3 for additional data fields.

Note: The data fields do not need to be contiguous.

5. Under Arrayfield Name, enter up to seven alphanumeric


characters.
6. Under Arrayfield Description, enter up to 40 alphanumeric
characters.
7. From the Arrayfield Missing Value list, select one of the following
conventions used when a record has missing data in any field in
the array:
• S for spaces.
• Z for zero or spaces.
• N for a negative number (such as -1).
• 9 for all nines (9999).
• X indicating no missing value.

Note: The missing value convention is used only by the Match


stage.

8. When you finish defining your array, click Add Arrayfield.


The array appears under Arrayfields.
9. When you finish defining all arrays for the data file, click Finish.

Using QualityStage Extensions


Ascential offers extended features that enhance QualityStage
functionality:
• QualityStage Interface to Data Warehouse Center: If you are
using the process flow and scheduling capabilities of IBM’s Data
Warehouse Center to manage and run your QualityStage projects,

4-28 QualityStage Designer User Guide


WORKING WITH PROJECTS
Using QualityStage Extensions

this feature allows you to import and export data to a Data


Warehouse Center data mart or data warehouse.
• Licensed Stages. QualityStage licensed stages include the
following:
• QualityStage CASS™ (Coding Accuracy Support System),
which you use to create QualityStage CASS certified
addresses. Using QualityStage CASS certified addresses
qualifies you for mail rate discounts from the U.S. postal
service, improves your return to sender rate, and provides you
with additional customer information that you can use to
improve customer relationships.
• QualityStage Z4Change™, which you use to update your
QualityStage CASS -certified address list for changes to the
ZIP+4 records that are provided by the USPS. This module
eliminates the need to re-certify the entire mailing list due to
these periodic changes.
• QualityStage WAVES™ (Worldwide Address Verification
and Enhancement System), which you use to verify
multinational addresses against country-specific postal
reference files. This module, for given address information,
corrects spelling, adds missing values, and verifies the address
information at the lowest available level for a given country;
either at the city-level and/or the street-level.
• QualityStage SERP™ (Software Evaluation and Recognition
Program), which you use to create QualityStage SERP
certified addresses. Ensures that your Canadian mailing lists
meet the requirements of the Canada Post Address Accuracy
Program and thus qualify for postage discounts.

Using the Interface to Data Warehouse Center

Configuring the To configure the Interface to Data Warehouse Center, use the
Interface to Data Designer Options dialog box:
Warehouse Center
1. From the QualityStage main window select File ➤ Designer
Options.

QualityStage Designer User Guide 4-29


4 WORKING WITH PROJECTS
Using QualityStage Extensions

The Designer Options dialog box appears.


2. Click the Data Warehouse tab. It looks like this:

3. For detailed information about using this dialog box, see the
QualityStage Guide to the Data Warehouse Center Interface.

Importing To import datafile definitions from Visual Warehouse, choose File ➤


Import ➤ Datafile Definition ➤ From Visual Warehouse.

Exporting To export jobs to Visual Warehouse:

1. On the right pane of the QualityStage main menu, do one of the


following:
• Select the job you want to export
• Select the project whose jobs you want to export
2. Choose File ➤ Export ➤ Job ➤ To Visual Warehouse.
See the QualityStage Guide to the Data Warehouse Center Interface for
more information about how to use the Interface to Data Warehouse
Center product.

4-30 QualityStage Designer User Guide


WORKING WITH PROJECTS
Using QualityStage Extensions

Setting Up Licensed Stages


You access licensed stages using the QualityStage main window.
These stages require a specific version of the QualityStage server, for
which you must be licensed.

More information Consult the following documentation for more information about
licensed stages:
• The QualityStage CASS and Z4Change Guide.
• The QualityStage WAVES User Guide.
• The QualityStage SERP User Guide.

QualityStage Designer User Guide 4-31


4 WORKING WITH PROJECTS
Using QualityStage Extensions

4-32 QualityStage Designer User Guide


5

Setting Up Run Profiles

This chapter describes how to set up run profiles.

Creating a Run Profile


Before you can deploy and run a QualityStage job, you must define
server file structure information in the appropriate Run Profile dialog
box. This information is used for relaying information from
QualityStage Designer to the QualityStage server when you deploy
and run your job.
You can create run profiles at any time.
You must create at least one run profile before you can deploy and run
QualityStage jobs.

What Run Profiles Define


Run profiles define:
• The name of your QualityStage server
• The location of your project directory
• Access information (such as user ID and password)

QualityStage Designer User Guide 5-1


5 SETTING UP RUN PROFILES
Creating and Managing Run Profiles

• Other information about running QualityStage on your server


The information you enter depends on the type of QualityStage server
you connect to:
• OS/390
• UNIX
• Windows, remote or local

More About Run Profiles


Note the following about run profiles:
• You can create as many profiles as you need.
• You associate profiles to individual jobs.
• You can define one profile to be the default profile to use each time
you deploy and run any job.
• If you want a job to use an alternate profile, you have only to specify
it when you first deploy the job, and QualityStage overrides the
default profile.

Creating and Managing Run Profiles

Creating Run Profiles


To create a run profile:

1. On the QualityStage main window, click File ➤ Run Profiles.

5-2 QualityStage Designer User Guide


SETTING UP RUN PROFILES
Creating and Managing Run Profiles

The Run Profiles dialog box appears:

2. Click New.
The Select Run Profile Template window appears:

3. Select the template you want to use.


Depending on the server template you select, an appropriate
QualityStage Profile Definition dialog box appears.

QualityStage Designer User Guide 5-3


5 SETTING UP RUN PROFILES
Defining an OS/390 Run Profile

For information about defining run profiles on each platform, see:


• “Defining an OS/390 Run Profile” on page 5-4.
• “Defining a UNIX or Windows Run Profile” on page 5-10.
• “Defining a Local Windows Run Profile” on page 5-16.

Copying, Modifying, or Deleting Run Profiles


To copy, modify, or delete a run profile:

1. On the QualityStage main window, click File ➤ Run Profiles.


2. Select the run profile you want to copy, modify, or delete.
3. Do one of the following:

To copy a. Click Copy.


b. Enter a new name in the Profile box.
c. (Optional) Make any other changes you want.
d. Click OK.

To modify a. Click Modify.


b. Make the changes you want.
c. Click OK.

To delete a. Click Delete.

Defining an OS/390 Run Profile


Defining a profile for an OS/390 server requires that you enter:
• Host Settings, which define your OS/390 server and your account
information
• A Project Profile, which defines the fully qualified data set for the
project

5-4 QualityStage Designer User Guide


SETTING UP RUN PROFILES
Defining an OS/390 Run Profile

Important: Some of the information required for the profile was defined
during the installation of the server software. Your system
administrator can supply any information that you require.

The Profile Definition dialog box looks like this:

To define your profile:

1. In the Profile box, enter up to 40 alphanumeric characters for the


profile name.
2. To define this profile as the default profile, select the Make
Default for all Projects check box.
3. Select OS/390 from the Host Type list.

QualityStage Designer User Guide 5-5


5 SETTING UP RUN PROFILES
Defining an OS/390 Run Profile

4. Enter the following information:

Next to Enter
Host Name Either the IP address or the host name for the
OS/390 system.
QualityStage Procedure Up to 44 characters.
Lib Location for the QualityStage job library that
contains the TSOPROC job.
Account Valid account number used for charge back,
audit, etc.
TCP Port TCP port number. The default is 23.
User ID Your user logon name
Password Your user password
Email (Optional) Full e-mail address for notification
when a job completes.
Important: Do not enter if your e-mail
system does not support the SMTP protocol.
Alternate Locale Enter the international locale you want
QualityStage to use at the server to process
the data.
This value needs to be set only if you are
processing data for a language that is not the
default language of the server. For example,
the default language for your server is
French, and the data to be processed is
Italian.
User Information (Optional) Any additional information that is
printed on the output banner page.
Printer (Optional) Local or remote printer to which
output from a batch job is routed. You can
enter up to eight characters. The default is
LOCAL.
Job Parameters (Optional) Parameters passed to the system
through a jobcard. You can enter up to 60
characters.

5-6 QualityStage Designer User Guide


SETTING UP RUN PROFILES
Defining an OS/390 Run Profile

Next to Enter
VSAM DASD Volume Either use the default asterisk (*) or, if it is
not supported, enter a valid DASD volume
(up to 6 characters). Defines temporary space
required by the Standardize and Match
stages.
Region Amount in multiples of 1024K of CPU storage
used for a region size when a job is submitted
for batch processing. The default is 0M.
If you set this value to less than the default,
you risk causing an ABEND, nonzero return
codes, or unexpected error messages. See the
QualityStage OS/390 Server Guide for more
information about this parameter.
Execution Time (Optional) Maximum allotted execution time.
You can enter any value between 0 and 1440,
with 1440 indicating no time limit.
Execution Class One character indicating the execution class.
The default is A. Check with your system
administrator.
Output Class One character indicating the hold queue for
the TCP/IP connection. The default is H.
Check with your system administrator.
Disk Space (Cylinders) (Optional) An integer from 1 through 5000.
Amount of primary space allocated for all of
the QualityStage job’s data sets.
If this field is left empty, primary space to be
allocated is taken from the default settings in
the SYMBOLS file.

5. When you finish, click the Project Profile tab.

QualityStage Designer User Guide 5-7


5 SETTING UP RUN PROFILES
Defining an OS/390 Run Profile

The Project Profile dialog box looks like this:

Use this dialog box to set up an application area on the OS/390


server. The information you enter here is the same information
you would enter in the QualityStage ISPF panel that adds
applications.

5-8 QualityStage Designer User Guide


SETTING UP RUN PROFILES
Defining an OS/390 Run Profile

6. Enter the following information:

Next to Enter
National Characters The 3 national characters that are valid for
(valid in data set data set names. The default is $#@, which
names) are the national characters that are valid in
the US.
In the UK, for example, # and @ are valid,
but $ is not, whereas the UK £ (pound
sterling sign) is valid.
Library Qualifiers Up to 35 characters.
First- and second-level qualifiers for the ARD
(alib), Control (clib), Reports (rlib), Repository
(tlib), Skeletons (slib), and Uni (ulib) libraries.
These are the libraries where QualityStage
stores the staged information from the client.
Data First Qualifier High-level qualifier for your input and output
data files.
Data Second Qualifiers Second-level qualifiers for your input and
output data files.
VSAM Qualifiers Up to 22 characters.
High-level qualifiers for VSAM files.
Work File Name Up to 22 characters.
Qualifiers QualityStage jobs create certain files that are
cataloged but not required (except perhaps
for debugging purposes) once the job has
completed successfully.
You can give such data sets a distinct
high-level qualifier to distinguish them from
permanent data sets. Furthermore, if you
want to run the same QualityStage job more
than once in parallel, you may need to use
different work file name qualifiers to avoid
contention on work file data sets.
If you leave this field empty, the default value
is the data high-level qualifier.

QualityStage Designer User Guide 5-9


5 SETTING UP RUN PROFILES
Defining a UNIX or Windows Run Profile

7. When you are finished, click OK.


Your newly defined profile appears:
• In the list on the Run Profiles dialog box, and
• In the Profile list on the Job Run Options dialog box

Defining a UNIX or Windows Run Profile


Important: We strongly recommend that you always use a remote Windows
run profile whenever your QualityStage server is running on a
Windows system. You should use a remote Windows run profile
even if the QualityStage server and QualityStage Designer are
running on the same machine.

Defining a profile for a UNIX or Windows server requires that you


enter the host settings, which define your server and your account
information. Optionally, you can enter advanced project settings,
which allow you to use different directories for the various project
directories.

Note: The UNIX and Remote Windows templates are similar, but they
provide different default values.

5-10 QualityStage Designer User Guide


SETTING UP RUN PROFILES
Defining a UNIX or Windows Run Profile

The Profile Definition dialog box looks like this:

To define your profile, do the following for either the UNIX or the
Remote Windows server template:

1. Next to Profile, enter up to 40 alphanumeric characters for the


profile name.
2. To define this profile as the default profile, select the Make
Default for all Projects check box.

QualityStage Designer User Guide 5-11


5 SETTING UP RUN PROFILES
Defining a UNIX or Windows Run Profile

3. Enter the following information:

Next to Enter
Host Name Host name or IP address for the server.
Host Server Path Full path for the directory in which you installed the
server software. The default is:
On UNIX: /Ascential/QualityStageServer70/bin
On Windows: C:\Ascential\QualityStageServer70
Master Project Full path for the project directory. This is the
Directory directory in which the data, scripts, control
members, and logs are stored.
The default is:
On UNIX: /Ascential/QualityStageServer70/Projects
On Windows: C:\Projects
If this directory does not exist, QualityStage
automatically creates it when you deploy your first
job provided the user has the appropriate access for
creating that directory.
TCP Port Port that the server is started on.
Email (Optional) Full e-mail address for notification when
a job finishes.

5-12 QualityStage Designer User Guide


SETTING UP RUN PROFILES
Defining a UNIX or Windows Run Profile

Next to Enter
Alternate Locale Enter the international locale you want
QualityStage to use at the server to process the
data.
This value needs to be set only if you are processing
data for a language that is not the default language
of the server. For example, the default language for
your server is French, and the data to be processed
is Italian.
If you are running the Multinational Standardize or
the WAVES stage on a UNIX server, you must enter
a German ISO locale.
For more information on locales, see the
QualityStage UNIX, Linux, and Windows Server Guide.
Local Report Full path to the location on the client system where
Data Location prepared QualityStage report data is stored.

Click to browse for a location.

4. When finished, you can either:


• Click OK. Your newly defined profile appears:
– In the list on the Run Profiles dialog box, and
– In the Profile list on the Job Run Options dialog box
• Click the Advanced Project Settings tab.

QualityStage Designer User Guide 5-13


5 SETTING UP RUN PROFILES
Defining a UNIX or Windows Run Profile

The Advanced Project Settings dialog box looks like this:

With the Advanced Project Settings dialog box, you can define an
alternative location for the following directories:
• Data, where your input data files reside and the output data files
are created.
• Controls, where the control member files are created.
• Temp, where files are created and deleted during script execution.
• Logs, where the log files from running a job are created.
• Scripts, where the executable scripts are generated.
By default, QualityStage creates these directories under the directory
you specified as the Master Project Directory in the Profile Definition

5-14 QualityStage Designer User Guide


SETTING UP RUN PROFILES
Defining a UNIX or Windows Run Profile

dialog box. If you want any of these directories in another location, you
must specify the full path in this dialog box.
When finished, you can either:
• Click OK. Your newly defined profile appears:
– In the list on the Run Profiles dialog box, and
– In the Profile list on the Job Run Options dialog box
• Click the FTP Settings tab.
The FTP Settings tab looks like this:

QualityStage Designer User Guide 5-15


5 SETTING UP RUN PROFILES
Defining a Local Windows Run Profile

Enter the following information:

Login ID Name of the user that owns the server directory and
the master project directory and that starts the
QualityStage server.
Password Password for the Login ID.
FTP Protocol Choose either SFTP (secure FTP) or FTP.
Port FTP port number. The default is …
Public Key

Select the Blocking check box to …


Select the Passive check box to …

Defining a Local Windows Run Profile


Important: The local Windows run profile is intended only for demonstration
or testing purposes. We strongly recommend that you always use
a remote Windows run profile whenever your QualityStage server
is running on a Windows system. You should use a remote
Windows run profile even if the QualityStage server and
QualityStage Designer are running on the same machine.

Defining a run profile for a local Windows server requires that you
enter the paths to your QualityStage server software and Master
Project Directory.

5-16 QualityStage Designer User Guide


SETTING UP RUN PROFILES
Defining a Local Windows Run Profile

The Profile Definition dialog box looks like this:

To define your profile:


1. Next to Profile, enter up to 40 alphanumeric characters for the
profile name.
2. To define this profile as the default profile, select the Make
Default for all Projects check box.

QualityStage Designer User Guide 5-17


5 SETTING UP RUN PROFILES
Defining a Local Windows Run Profile

3. Enter the following information:

Next to Enter
Host Server Path Full path to the directory in which you
installed the server software.
The default is:
C:\Ascential\QualityStageServer70
Master Project Full path to the project directory. The default
Directory is: C:\Projects.
You must create this directory before you can
deploy or run a project.
Alternate Locale Enter the international locale you want
QualityStage to use at the server to process
the data.
This value needs to be set only if you are
processing data for a language that is not the
default language of the server. For example,
the default language for your server is
French, and the data to be processed is
Italian.
For more information, see the QualityStage
UNIX, Linux, and Windows Server Guide.
Local Report Data Full path to the location on the client system
Location where prepared QualityStage report data is
stored.

Click to browse for a location.

4. When you are done, click OK.


Your newly defined profile appears:
• In the list on the Run Profiles dialog box, and
• In the Profile list on the Job Run Options dialog box

5-18 QualityStage Designer User Guide


6

Building Jobs

QualityStage uses jobs to process data, creating various intermediate


and final stages of re-engineered data.
The processing criteria are determined by rule sets that you specify for
a job.
Jobs incorporate a number of data re-engineering stages.
QualityStage provides several stages for use at each phase of the
workflow.
In addition to the stages provided by QualityStage, you can build your
own stages and add them to your jobs.
This chapter describes the following:
• “Why You Use Jobs” on page 6-2
• “Building QualityStage Jobs” on page 6-3
• “Defining a Job Using Existing Stages” on page 6-5
• “Defining and Modifying Stages” on page 6-8

QualityStage Designer User Guide 6-1


6 BUILDING JOBS
Why You Use Jobs

Why You Use Jobs


QualityStage provides the following stage types for investigating,
conditioning, matching, and consolidating data:

Stage Type What it does


Investigate Investigates data
Standardize Conditions data
Multinational Conditions multinational address (city- and street-level)
Standardize data
Match Matches data
Survive Consolidates data, including resolving conflicts within the
data

The following table summarizes other stage types provided by


QualityStage. For more information about these stages, see the
QualityStage Stages Reference Guide.

Stage Type What it does


Transfer Generates unique IDs, manipulates field positions,
generates multiple lines for a record, and processes
unprintable characters.
Select Splits files based on a specific criteria.
Unijoin Joins files based on specific criteria, move/alters data.
Sort Sorts files.
Collapse Collapses files to unique records and generates frequency
counts.
Parse Parse records.
Abbreviate Generates business name abbreviations.

6-2 QualityStage Designer User Guide


BUILDING JOBS
Building QualityStage Jobs

Stage Type What it does


Build Rebuilds parsed records.
Program Runs third-party programs in a QualityStage job.
Format Converts ODBC tables or delimited text files into the
Convert QualityStage standard file format.
Also writes standard QualityStage files to ODBC and
text-delimited formats.

If you need to perform data re-engineering activities that are not


provided by these stages, you can build your own. You can customize
these stages to build your own jobs.
Jobs you build can perform a variety of functions, such as formatting
or re-organizing data files. For example, if your source file contains
address data from more than one country, you can write a job that
splits the data file based on a country code. QualityStage uses
different rule sets for different countries. When you split the source
file into one file for each country, QualityStage can more efficiently
use the rule sets to standardize and match the records. Splitting a file
is a series of processes that you can combine in one, easy-to-use job
that you reuse as needed.

Building QualityStage Jobs


To create your data re-engineering projects, QualityStage lets you:
• Customize stages
• Build your own jobs
To build your own jobs, you:
• Define one or more stages, and then
• Add them to your jobs.
This chapter describes how to build a job. For information on
customizing specific stage types, refer to the chapters in this user
guide and in the QualityStage Stages Reference Guide that describe
the stage.

QualityStage Designer User Guide 6-3


6 BUILDING JOBS
Building QualityStage Jobs

Use the QualityStage main window to create, modify, and run jobs.

Creating a New Job


Creating a new job entails:
• Creating a name for it, and then
• Adding it to the list of jobs.
To create a new job:

1. Do one of the following:


• On the left pane of the QualityStage main window:
a. Select a Project folder (or any of its subfolders)

b. On the Toolbar choose ➤ Job.


• On the left pane of the QualityStage main window:
a. Select a Jobs folder
b. Right-click anywhere on the right pane
c. Click New Job
The Add a Job dialog box appears.

2. Under Name, enter up to 8 alphanumeric characters.


3. Under Description, enter up to 40 alphanumeric characters.
4. Click OK.
Your new job is added to the list of jobs.

6-4 QualityStage Designer User Guide


BUILDING JOBS
Defining a Job Using Existing Stages

Renaming a Job
To rename a job or to change its description:

1. On the left pane of the QualityStage main window, select a Jobs


folder.
2. On the right pane, right-click the job you want to rename, and
then click Modify.
The Rename a Job dialog box appears.

3. Under Name, enter up to 8 alphanumeric characters.


4. Under Description, enter up to 40 alphanumeric characters.
5. Click OK.

Defining a Job Using Existing Stages


Once you create a name for a job, you specify the stages the job is to
perform.
You assign stages to the job in the order that you want them to run.
To define a job:

1. On the left pane of the QualityStage main window, select the job
you want to add stages to.
2. Right-click anywhere on the right pane to display a list of stage
types.

QualityStage Designer User Guide 6-5


6 BUILDING JOBS
Defining a Job Using Existing Stages

A X (right-arrow) to the right of a stage type points to a submenu


listing the specific stages of that type that are defined for the
current project.
3. Click a stage type to display the submenu of stages, and then click
the stage you want to add to the job.
The stage is added to the job’s stages list.
In the right pane, when you right-click a job stage, the shortcut menu
lets you:
• Add an available stage to the current job
• Remove the selected stage from the current job
• Rearrange the order in which stages are to run

Adding Existing Stages to a Job


You define a new job by adding existing stages to it.
See “Defining and Modifying Stages” on page 6-8 to learn how to
create a new stage or modify an existing stage.

Reordering or Removing Stages in a Job


When defining the job, you may want to:
• Change the order of the stages assigned to the job, or
• Delete one or more stages.
On the right pane, right-click a job stage, and then use the shortcut
menu to reorder or remove the selected stage.

To reorder To change the order of a stage in the job’s list of stages:

1. Right-click the stage you want to move.


2. Click Move Up or Move Down to move the stage to the desired
position.

Tip: You can also drag and drop stages to reorder the list.

6-6 QualityStage Designer User Guide


BUILDING JOBS
Defining a Job Using Existing Stages

To remove To remove a stage from the job’s list of stages:

1. Right-click the stage you want to remove.


2. Click Remove.

Setting Output Files to Include/Exclude for a Job


When defining a job, you can specify which output files the job will
generate.
To specify output files that a job will generate:
1. On the left pane of the QualityStage main window, select a Jobs
folder.
2. On the right pane, right-click a job in the job list, and then click
Output Files.
The Output Files Written for Job dialog box appears:

QualityStage Designer User Guide 6-7


6 BUILDING JOBS
Defining and Modifying Stages

To change a file’s Output status:

1. Select the entry to change.


2. Click Toggle Output.
3. Do one of the following:
• Click Exit to save any changes.
• Click Cancel to exit without saving changes.

Defining and Modifying Stages


If the list of stages in a project does not include a stage you want, you
can create a new stage from an existing one, or you can choose a stage
type and specify the stage options you want.
Creating and modifying stages is made easy through the appropriate
QualityStage wizards.
After you define a new stage or modify an existing one, you can add it
to the job you are defining.
Use the QualityStage main window to define and manage stages that
you can add to your jobs.
You can:
• Create a new stage
• Modify an existing stage
• Copy an existing stage and then modify it to create a new one
• Delete a stage that is not included in a job

Creating a New Stage


To create a new stage:
1. On the left pane of the QualityStage main window, select a Stages
folder.

6-8 QualityStage Designer User Guide


BUILDING JOBS
Defining and Modifying Stages

2. Do one of the following:

• From the Toolbar, choose ➤ Stage, and then click the


name of the type of stage you want to add.
• Right-click anywhere on the right pane, click New Stage, and
then click the name of the type of stage you want to add.
The Stage Wizard for the stage type you chose appears.
For instructions and detailed information describing the stage you
want to define, see the appropriate chapter in this user guide or in
the QualityStage Stages Reference Guide.
3. After you define the stage, click Finish to add it to the stages list.

Modifying an Existing Stage


To modify an existing stage:
1. On the left pane of the QualityStage main window, select a Stages
folder.
2. On the right pane, right-click the stage you want to modify, and
then click Modify.
The Stage Wizard for the stage type you chose appears.
For instructions and detailed information describing the stage you
want to define, see the appropriate chapter in this user guide or in
the QualityStage Stages Reference Guide.
3. After you modify the stage, click Finish.

Creating a New Stage from an Existing One


To copy a stage to create a new one:

1. On the left pane of the QualityStage main window, select a Stages


folder.

QualityStage Designer User Guide 6-9


6 BUILDING JOBS
Defining and Modifying Stages

2. On the right pane, right-click the stage you want to copy, and then
click Copy.
3. Right-click anywhere on the right pane, and then click Paste.
The Select a stage name dialog box appears.

4. Enter up to 8 alphanumeric characters.


5. Click OK.
Your new stage is added to the stages list.
6. Do one of the following:
• Double-click the new stage.
• Right-click the new stage, and then click Modify.
The Stage wizard appears.
For instructions and detailed information describing the stage you
want to define or modify, refer to the appropriate chapter in this
user guide or in the QualityStage Stages Reference Guide.
7. After you modify the stage, click Finish.

Deleting a Stage from a Project


To delete a stage from the list of stages for a project:

1. On the left pane of the QualityStage main window, select a Stages


folder.
2. On the right pane, right-click the stage you want to delete, and
then click Delete.

Note: You cannot delete a stage from a project if it is used in a job.


6-10 QualityStage Designer User Guide
7

Deploying Jobs

Creating output from any QualityStage job involves:


• Deploying the job, and then
• Running it.
This chapter describes the following:
• “Deploying a Job” on page 7-5
• “Deploying a Job Creates a Project File Structure” on page 7-9
• “Moving Input Data to the Correct Project Library Location” on
page 7-9

About Deploying and Running Jobs

Run Profiles
Before you deploy and run a job for the first time, you must set up a
run profile. For information about setting up run profiles, see
Chapter 5, “Setting Up Run Profiles”.

QualityStage Designer User Guide 7-1


7 DEPLOYING JOBS
About Deploying and Running Jobs

About Deploying Jobs


Each time you are ready to run a newly-created or or newly-modified
job, you must first deploy it.
When you deploy a job, QualityStage does the following:
• Creates files on your client, including control members, Abstract
Run Descriptions (ARDs), pattern-action data, dictionary, as
required by the job.
• Transfers these files to your server using FTP.
• Builds a JCL or shell script on the server based on its type (OS/390,
UNIX, or Windows).
After deploying a job, you manually move the input data files into the
appropriate project directories on your client host.
The deploying process is described in “Deploying a Job” on page 7-5.

About Running Jobs


Running a job involves executing and producing output based on your
definitions.
When you run a job, QualityStage executes the JCL or shell script on
your server. Each time you make any modifications to a job (such as
add or modify a stage, modify a data file definition, edit a
Pattern-Action file), you must redeploy the job before you run it.

Tip: Because QualityStage builds the JCL and shell script on the
server, you can deploy and run the same job on all server types.

The running process is described in “About Running Jobs” on page 8-1.

About Run Modes


You need to select a mode in which to deploy and run your job.
QualityStage offers three run modes:
• File mode

7-2 QualityStage Designer User Guide


DEPLOYING JOBS
About Deploying and Running Jobs

• Data stream mode


• Parallel Extender mode (available only on UNIX systems)

File Mode
File mode should be familiar to all preexisting QualityStage users,
because until INTEGRITY version 3.6, it was the only processing
mode available.
If you are using file mode, QualityStage inputs your data file into the
first stage in your job and processes the entire file before handing the
results to the next stage in your job. QualityStage also generates
interim files while processing, which remain on your system.
File mode has several advantages:
• Allows you to run sections of jobs; this is a useful feature for
debugging.
• Allows you to generate Match reports and default extracts. See
“Working with Match Reports” on page 13-1 for more information.
• May be faster if you are processing with a single CPU (Central
Processing Unit).

Data Stream Mode


If you are using data stream mode, QualityStage puts the first record
of your input files into the first stage. QualityStage then passes the
first record of your input files to the second stage and inputs the
second record into the first stage. This process continues until all
records are processed by all the stages.
The only stage that is processed differently is the Sort stage, which
must read and process all records together. Therefore, having a large
number of Sort stages can slow processing considerably.
Data stream mode has several advantages:
• Does not produce intermediate files, which means the process
takes up much less disk space
• Can take advantage of multiple CPUs

QualityStage Designer User Guide 7-3


7 DEPLOYING JOBS
About Deploying and Running Jobs

However, data stream mode cannot be used:


• To run Investigate stages
• To generate Match reports and default extracts

Parallel Extender Mode


As of INTEGRITY version 6.0, you can use the Parallel Extender
mode to improve the performance of CPU-intensive QualityStage jobs
by running them in parallel. This mode extends QualityStage’s data
quality management capabilities by accelerating throughput and
reducing processing time.
Parallel Extender mode works in a way similar to data stream mode.
The advantages of Parallel Extender mode are like those for data
stream mode:
• Does not produce intermediate files, which means the process
takes up much less disk space
• Takes advantage of multiple CPUs and parallel processing to
increase processing speed
In addition, DataStage users can integrate QualityStage Parallel
Extender jobs with their DataStage Parallel Extender jobs. They use
the DataStage Designer Job Sequencer to do this. For information
about integrating DataStage and QualityStage Parallel Extender jobs,
see Chapter 6 of the DataStage Designer Guide.

Important: Parallel Extender mode is available only on UNIX systems.


Before you can deploy or run a job in Parallel Extender mode, you
must purchase a Parallel Extender license and have the Parallel
Extender software installed and running on your QualityStage
server host system.
For information about licensing and system requirements, speak to
your Ascential Software account representative.

7-4 QualityStage Designer User Guide


DEPLOYING JOBS
Deploying a Job

Comparing Run Modes


All modes give you the same results. Since performance is greatly
dependent on your particular processing environment, you may want
to evaluate each method to see which one is the most efficient for you.
However, file mode processes data differently from data stream and
Parallel Extender modes, and there are several things you need to
take into consideration when designing your tests.
If you want to verify that all modes produce the same results, put a
Sort stage before any Match stages in the job executed in data stream
mode or Parallel Extender mode. This is because file mode
automatically creates a Sort stage before any Match stages, whereas
data stream mode and Parallel Extender mode do not.
You cannot write out a nonstandard QualityStage file and read it back
in within the same job if you are executing it in either data stream
mode or Parallel Extender mode. For example, you cannot write out a
delimited text file and then read it back in within a single job. This is
because data stream mode and Parallel Extender mode do not allow
files that are not in the QualityStage standard file format to be passed
between stages within a QualityStage job.

Deploying a Job
When you first create a job, you must deploy it without running it.
When you first deploy a job, the project directory is created with the
appropriate subdirectories (Controls, Data, Logs, etc.). After all
directories in the project directory exist, you must move your data files
into the project directory or the data library. The data in these files is
used when you run the job.
If you have not set up a run profile, you must do so before you deploy
any jobs. For information about setting up run profiles, see Chapter 5,
“Setting Up Run Profiles”.

Important: On Windows systems, you must re-deploy jobs that were created
and deployed using versions of QualityStage or INTEGRITY

QualityStage Designer User Guide 7-5


7 DEPLOYING JOBS
Deploying a Job

earlier than version 7.0. See the UNIX, Linux, and Windows Server
Guide for details.

How to Deploy a Job


To deploy a job:

1. On the left pane of the QualityStage main window, select a Jobs


folder.
2. From the jobs list on the right pane, select the job you want to
deploy.
3. Do one of the following:

• On the Toolbar, click .


• Right-click the job you want to deploy, and then click Run.
The Job Run Options dialog box appears.
4. Select a run profile from the Profile list.
If you have not defined a run profile, click Setup. For information
about how to set up run profiles, see Chapter 5, “Setting Up Run
Profiles”.
5. In the Job Run Options dialog box, under Select Run Options,
clear the Run check box so that only the Deploy check box is
selected.
QualityStage saves this option setting after each deployment.
If you are using a remote server, the Wait for Completion check
box appears. By default it is selected.
6. Click one of the following:
• Execute Data Stream Mode
• Execute File Mode
• Execute Parallel Extender Mode
See “About Run Modes” on page 7-2 for more information on the
three modes.

7-6 QualityStage Designer User Guide


DEPLOYING JOBS
Deploying a Job

Note: In Parallel Extender mode, if you intend to run a project built


with an earlier version of QualityStage (or INTEGRITY), you
must deploy the project using Parallel Extender mode before
you can run it.

Deploying Jobs in Data Stream Mode or Parallel Extender Mode


After you click either:
• Execute Data Stream Mode, or
• Execute Parallel Extender Mode
QualityStage deploys the job and creates the project file structure.
If you selected Wait for Completion, or if you are using a local
Windows server, status messages appear in the Status window.
When the job finishes, the following message box appears:

Deploying Jobs in File Mode


If you clicked Execute File Mode, the File Mode Execution dialog box
appears.
The contents of this screen vary, depending on the job you are
deploying. See the following sections for examples of specific File Mode
Execution screens:
• “Running Investigate Jobs” on page 9-23
• “Running Standardize Jobs” on page 10-24
• “Running Multinational Standardize Jobs” on page 11-9
• “Running Match Jobs” on page 12-42
• “Running Survive Jobs” on page 14-18

QualityStage Designer User Guide 7-7


7 DEPLOYING JOBS
Deploying a Job

The Deploy check box is selected as in the preceding Job Run Options
dialog box.

Using the File Mode Execution Dialog Box for Deploying Jobs
In file mode, QualityStage creates data files that you can then use for
debugging a job. The File Mode Execution dialog box lists all the
stages in the job. Use this dialog box to set starting and ending points
for deploying the job.
By default, all stages listed are run from first to last. However, to
select a subset, do the following:

1. Select the stage you want to start with.


2. Click Set Starting Stage.
A Start marker appears in the Start/End column to the left of the
stage.
3. Select the stage you want to end with.
4. Click Set Ending Stage.
An End marker appears in the Start/End column to the left of the
stage.
To deploy the job:

1. Click Run From Start to End.


The progress of the deployment is noted in the Status box.
When deploying has successfully finished, the following message
appears:

2. Click OK.

7-8 QualityStage Designer User Guide


DEPLOYING JOBS
Deploying a Job Creates a Project File Structure

You must now move your input files into the appropriate project
directory. See “Moving Input Data to the Correct Project Library
Location” on page 7-9 for information on how to do this.

Deploying a Job Creates a Project File Structure


When you deploy a job, QualityStage creates the appropriate project
file structure on the server based on the information you provide in
the run profile (see “What Run Profiles Define” on page 5-1).
Into this file structure, QualityStage puts the files defined in
QualityStage Designer, generates the JCL or shell script, and stores
the output from the deployment.
You are responsible for providing QualityStage with your input data
files in the correct project library location.

Moving Input Data to the Correct Project Library Location


After you deploy your job, you need to make your input data files
available to your QualityStage server by moving them to the correct
project library location as described in this section.

Deploying Jobs on an OS/390 Server


On an OS/390 server you inform QualityStage of the location of your
data by specifying the high-level and second-level qualifiers in your
run profile. You must put your data files in this location before you
run the job. For more information, see “Defining an OS/390 Run
Profile” on page 5-4

QualityStage Designer User Guide 7-9


7 DEPLOYING JOBS
Moving Input Data to the Correct Project Library Location

Deploying Jobs on a UNIX Server


On a UNIX server, QualityStage assumes that your data is located by
default in the projectname/Data directory of your Master Project
Directory, which you define in your run profile.
Optionally you can specify the directory in which your data resides
with the Advanced Projects Setting tab of the UNIX Run Profile
Definition dialog box. In either case, you must put your data files in
this location before running the job.
If you expanded the samples.tar file from your installation,
QualityStage automatically created a Projects directory in the
/Ascential/QualityStageDesigner<version> directory. You can use
/Ascential/QualityStageDesigner<version>/Projects as the template’s Master
Project Directory, or you can create another directory. If you do so, you
must specify the full path of this directory in your run profile (see
“Defining a UNIX or Windows Run Profile” on page 5-10).

Deploying Jobs on a Windows Server


On a Windows server, QualityStage assumes by default that your data
is located in the projectname\Data directory of your Master Project
Directory, which you define in your run profile.

Important: This directory must already exist on your PC; QualityStage


Designer does not create it for you. If you do not create the
directory, QualityStage displays an error message indicating that
it cannot find the path.

Optionally, you can specify the full path and directory in which your
data resides with the Advanced Project Settings tab of the Profile
Definition dialog box. You must put your data files in this location
before running the job.

7-10 QualityStage Designer User Guide


DEPLOYING JOBS
Moving Input Data to the Correct Project Library Location

Deploying Jobs on a Local Windows Server


On a local Windows server, you must put your data files in the
projectname\Data directory of your Master Project Directory, which you
define in your run profile (see “Defining a Local Windows Run Profile”
on page 5-16).

QualityStage Designer User Guide 7-11


7 DEPLOYING JOBS
Moving Input Data to the Correct Project Library Location

7-12 QualityStage Designer User Guide


8

Running Jobs

After you deploy your job and move your input data files into the
appropriate directory, you are ready to run the job.
This chapter describes the following:
• “Running a Job from QualityStage Designer” on page 8-2
• “Running a Job from the Command Line on UNIX Systems” on
page 8-8
• “Running a Job from the Command Line on Windows Systems” on
page 8-9
• “Restarting a Job” on page 8-12
• “Viewing Job Output Files” on page 8-13

About Running Jobs


You can run jobs in one of two ways:
• From QualityStage Designer
• From the command line
For most jobs, QualityStage stores the result files in the same location
as the input data files.

QualityStage Designer User Guide 8-1


8 RUNNING JOBS
Running a Job from QualityStage Designer

For more information on the file structure used by QualityStage, refer


to the QualityStage UNIX, Linux, and Windows Server Guide and the QualityStage
OS/390 Server Guide.

Remote Servers
With OS/390, UNIX, Linux, and Windows servers, QualityStage ends the
connection from the client to the server after all files have been
transferred.
When the job is finished, you can receive an e-mail message
containing the job results. This message is sent to the e-mail address
you define in the run profile.
Optionally you can continue the connection to the server during the
running of the job and receive status messages in the Status window.
You might want to follow the status of your job run during
development and testing of your project.

Local Windows Server


With local Windows servers you remain connected to the server during
the running of a job. When the job is finished, a message appears
notifying you that the job is finished.

Running a Job from QualityStage Designer


1. On the left pane of the QualityStage main window, select a Jobs
folder.
2. From the jobs list on the right pane, select the job you want to run.
3. Do one of the following:

• On the Toolbar, click .


• Right-click the job you want to run, and then click Run.

8-2 QualityStage Designer User Guide


RUNNING JOBS
Running a Job from QualityStage Designer

4. Select a run profile from the Profile list.


If you have not defined a run profile, click Setup to define one. For
information about how to set up run profiles, see Chapter 5,
“Setting Up Run Profiles”.
5. In the Job Run Options dialog box, under Select Run Options,
clear the Deploy check box so that only the Run check box is
selected.
QualityStage saves this option setting after each run.

Important: You can clear the Deploy check box if you made no changes
(such as add or modify a stage, modify a data file definition,
edit a Pattern-Action file) to your job. However, if you make
changes to your job or to any of its stages, you need to deploy
it again. For information about deploying jobs, see Chapter 7,
“Deploying Jobs”.

6. (Optional) If you want to run a QualityStage formatted report


after you run the job, do the following:
a. Select Prepare Report Data.
This specifies that prepared report data output will be put in
the Data directory for the project.
b. Select Retrieve Report Data, and specify the maximum file
size to retrieve.
The output file will be copied to the location specified in the
run profile for local report data.
See Chapter 15, “Working with QualityStage Reports”, for more
information about preparing data for formatted reports.
7. (Optional) Clear the Wait for Completion check box.
The Wait for Completion check box appears if you are using a
remote server. By default it is selected.

Advanced Run 8. (Optional) Click Advanced Run Options to see other options you
Options can set, depending upon whether:
• Your QualityStage server is running on an OS/390 system
• You are running in Parallel Extender mode

QualityStage Designer User Guide 8-3


8 RUNNING JOBS
Running a Job from QualityStage Designer

• You are running a job using one or more Match stages


Select the options you need.
The next sections describe the following advanced run options:
• OS/390 job options
• Parallel Extender options
For information about Match job options, see “Match job options”
on page 12-44.

OS/390 job options If you are running a job on an OS/390 server and you select
Advanced Run Options, the following screen appears:

You can specify the following parameters:

Parameter Value
Data First Qualifier High-level qualifier for your input and output
data files.
Data Second Second-level qualifiers for your input and output
Qualifiers data files.
VSAM Qualifiers Up to 22 characters.
High-level qualifiers for VSAM files.

8-4 QualityStage Designer User Guide


RUNNING JOBS
Running a Job from QualityStage Designer

Parameter Value
Work File Name Up to 22 characters.
Qualifiers QualityStage jobs create certain files that are
cataloged but not required (except perhaps for
debugging purposes) once the job has completed
successfully.
You can give such data sets a distinct high-level
qualifier to distinguish them from permanent
data sets. Furthermore, if you want to run the
same QualityStage job more than once in
parallel, you may need to use different work file
name qualifiers to avoid contention on work file
data sets.
If you leave this field empty, the default value is
the data high-level qualifier.
Disk Space (Optional) An integer from 1 through 5000.
(Cylinders) Amount of primary space allocated for all of the
QualityStage job’s data sets.
If this field is left empty, primary space to be
allocated is taken from the default settings in
the SYMBOLS file.
Run Identifier (Optional) A single uppercase letter (A – Z) or a
number from 0 – 9.
The Run ID is suffixed to the name of the MVS
job on the server. Use Run IDs to distinguish
among two or more MVS jobs running at the
same time on the server.

QualityStage Designer User Guide 8-5


8 RUNNING JOBS
Running a Job from QualityStage Designer

Parallel Extender If you are running a job on a UNIX server, the following screen
job options appears:

If you are running a job using Parallel Extender, you can specify
the kind of sorting you want to use.

Execute mode 9. Click one of the following:


• Execute Data Stream Mode
• Execute File Mode
• Execute Parallel Extender Mode
See “About Run Modes” on page 7-2 for more information on the
three run modes.

Note: In Parallel Extender mode, if you intend to run a project built


with an earlier version of QualityStage, you must deploy the
project using Parallel Extender mode before you can run it.

Running in Data Stream Mode or Parallel Extender Mode


After you click either:
• Execute Data Stream Mode, or
• Execute Parallel Extender Mode
QualityStage runs the job.

8-6 QualityStage Designer User Guide


RUNNING JOBS
Running a Job from QualityStage Designer

If you selected Wait for Completion, or if you are using a local


Windows server, status messages appear in the Status window.
After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

Running in File Mode


If you click Execute File Mode, the File Mode Execution dialog box
appears.
The File Mode Execution dialog box lists all the stages in the job. You
must run the same stages that you selected when you deployed the
job. See “Deploying Jobs in File Mode” on page 7-7 for more
information.
By default, all stages listed are run from first to last. However, to
select a subset, do the following:

To select a subset 1. Select the stage you want to start with.


2. Click Set Starting Stage.
A Start marker appears in the Start/End column to the left of the
stage.
3. Select the stage you want to end with.
4. Click Set Ending Stage.
An End marker appears in the Start/End column to the left of the
stage.
5. Under Select Run Options:
a. Select Deploy, Run, or both.
See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.
b. (Optional) Select Prepare Report Data.
This specifies that prepared report data output will be put in
the Data directory for the project.

QualityStage Designer User Guide 8-7


8 RUNNING JOBS
Running a Job from the Command Line on UNIX Systems

c. (Optional) Select Retrieve Report Data, and specify the


maximum file size to retrieve.
The output file will be copied to the location specified in the
run profile for local report data.
See Chapter 15, “Working with QualityStage Reports”, for more
information about preparing data for formatted reports.

To run the job 6. Click Run From Start to End.


The progress of the run is noted in the Status box.
7. When the run finishes successfully, the following message
appears:

8. Click OK.
After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

Running a Job from the Command Line on UNIX Systems


When you deploy a job, three script files are created, one for each of
the three run modes. The scripts are created in the Scripts subdirectory
of the project directory.

Mode scripts The base name of the script file is the name of the job. Its extension
identifies the run mode:
• File mode scripts end with .stp
• Data stream mode scripts end with .scr
• Parallel Extender mode scripts end with .par

8-8 QualityStage Designer User Guide


RUNNING JOBS
Running a Job from the Command Line on Windows Systems

For example, if the job name is TEST, the following three scripts are
created:
• TEST.stp
• TEST.scr
• TEST.par

How to run To run any of the scripts, use the following syntax at a UNIX shell
mode scripts prompt:
scriptname -ipe.env proc_env_file -ipe.env proj_env_file
scriptname is the full or relative path of the run script.
proc_env_file is the full or relative path of the environment file
associated with the job. It is located in the Scripts directory. Its file
name is the name of the job with an .env extension.
proj_env_file is the full or relative path of the project environment file.
It is located in the project directory. Its file name is ipe.env.sh.

Example To run the TEST.par Parallel Extender mode script, enter the following
command from the Scripts directory:
TEST.par –ipe.env TEST.env –ipe.env ../ipe.env.sh

Running a Job from the Command Line on Windows Systems


When you deploy a job, two script files are created, one for running in
file mode, the other for running in data stream mode. The scripts are
created in the Scripts subdirectory of the project directory.

Environment You must set the TK_DO_NOT_RUN_WITH_REG_ASSOCIATIONS


variable environment variable as follows:
TK_DO_NOT_RUN_WITH_REG_ASSOCIATIONS=1
export TK_DO_NOT_RUN_WITH_REG_ASSOCIATIONS

Mode scripts The base name of the script file is the name of the job. Its extension
identifies the run mode:
• File mode scripts end with .stp
• Data stream mode scripts end with .scr
QualityStage Designer User Guide 8-9
8 RUNNING JOBS
Running a Job from the Command Line on Windows Systems

For example, if the job name is TEST, the following two scripts are
created:
• TEST.stp
• TEST.scr

How to run To run any of the scripts, use the following syntax at an MKS bash or
mode scripts ksh shell prompt:
scriptname -ipe.env proc_env_file -ipe.env proj_env_file
scriptname is the full or relative path of the run script.
proc_env_file is the full or relative path of the environment file
associated with the job. It is located in the Scripts directory. Its file
name is the name of the job with an .env extension.
proj_env_file is the full or relative path of the project environment file.
It is located in the project directory. Its file name is ipe.env.sh.

Example To run the TEST.scr script, enter the following command from the
Scripts directory:
TEST.scr –ipe.env TEST.env –ipe.env ../ipe.env.sh

Using Parallel Extender Persistent Data Sets Instead of


QualityStage Data Files
The Parallel Extender mode scripts are designed to run with:
• Traditional QualityStage input and output data files (default)
• Parallel Extender persistent data sets
Use the –noimport 1 option to indicate that the script should not try to
import data from a text file, because the input data is already in data
set format. The –noimport 1 option also indicates that the script
produces data sets as output, not text files.

Example To run the TEST.par Parallel Extender on persistent data sets, enter
the following command from the Scripts directory:
TEST.par –ipe.env TEST.env –ipe.env ../ipe.env.sh –noimport 1

8-10 QualityStage Designer User Guide


RUNNING JOBS
Running a Job from the Command Line on Windows Systems

Integrating QualityStage Parallel Extender Jobs with DataStage


Parallel Extender Jobs
The Job Sequencer in DataStage Designer lets you integrate
QualityStage and DataStage Parallel Extender jobs.
For complete information about the DataStage Job Sequencer, see
DataStage Designer Guide.
To create a job sequence that includes both QualityStage and
DataStage Parallel Extender jobs:

1. Add QualityStage and DataStage Parallel Extender jobs to the


Job Sequence Diagram window.
• Use the ExecCommand activity for each QualityStage Parallel
Extender job in the sequence.
• Include DataStage Parallel Extender jobs as Job activities.
2. Link the activities with triggers appropriate for your job sequence.
3. For each QualityStage ExecCommand activity:
a. In the Properties dialog box, select the ExecCommand page, and
then fill in the Command property with the full path to the
QualityStage Parallel Extender mode script to execute.
For example, to run the TEST.par Parallel Extender script
located in /Projects/TEST/Scripts, enter the following in the
Command property:
/Projects/TEST/Scripts/TEST.par
b. Fill in the Parameters property with the parameters of the
QualityStage Parallel Extender mode script, as follows:
–ipe.env job_env_file –ipe.env project_env_file –noimport 1
job_env_file is the full path to the environment file associated
with the job, located in the Scripts directory. The file name has
a .env extension.
project_env_file is the full path to the project environment file,
located in the project directory. Its file name is ipe.env.sh.
–noimport 1 indicates that the script treats input and output
data as Parallel Extender persistent data sets rather than as
text files.

QualityStage Designer User Guide 8-11


8 RUNNING JOBS
Restarting a Job

For example, the parameters associated with the command


specified in step a are:
–ipe.env /Projects/TEST/Scripts/TEST.env –ipe.env
/Projects/TEST/ipe.env.sh –noimport 1
4. See Chapter 6 of DataStage Designer Guide for instructions on
how to:
a. Fill in all other properties required.
b. Set up each DataStage Parallel Extender job activity.

Important: QualityStage Parallel Extender jobs read and write persistent


data sets with schems defined as follows:
• Only one field is defined in the record. Field length = total record
length.
• Field type is raw.
DataStage Parallel Extender jobs that interface with QualityStage
Parallel Extender jobs must account for the schema requirements of
the QualityStage job in order to work properly.

Restarting a Job
If your job halts before finishing, you can restart any job at a specific
stage.
To restart:

1. On the left pane of the QualityStage main window, select a Jobs


folder.
2. From the jobs list on the right pane, select the job you want to
restart.
3. Do one of the following:

• On the Toolbar, click .


• Right-click the job you want to restart, and then click Run.

8-12 QualityStage Designer User Guide


RUNNING JOBS
Viewing Job Output Files

4. On the Job Run Options screen select the Run check box. Clear the
Deploy check box if necessary.
5. Click Execute File Mode.
6. On the File Mode Execution screen click Set Starting Stage to
move the job starting point to the stage that failed.
7. Click Run From Start to End.
QualityStage builds and then submits new JCL or script, which starts
executing at the stage that previously failed.

Note: On OS/390 systems, you can also manually edit the QualityStage
JCL and include a RESTART parameter on the JOB card. For a
description of how to do this, see the QualityStage OS/390 Server
Guide.

Viewing Job Output Files


Except on OS/390 systems, the results (output) files from your job are
in the same location as your input data files on the server. You can use
a text editor to view the results.
If you are using a remote server, you can view the files as described
here:
• For OS/390 servers, you can use a 3270 terminal emulation
program on your client workstation to access and view the files on
the server.
• For UNIX servers, you can use a Telnet connection from your client
workstation to your UNIX server to view the results files.

QualityStage See Chapter 15, “Working with QualityStage Reports” for information
reports about how to create and generate QualityStage formatted reports.

QualityStage Data See Chapter 16, “Using the QualityStage Data File and Report
File and Report Viewer” for information about how to use the QualityStage Data File
Viewer and Report Viewer.

QualityStage Designer User Guide 8-13


8 RUNNING JOBS
Viewing Job Output Files

8-14 QualityStage Designer User Guide


9

Defining Investigate Stages

Investigating data is the second step in Phase Two of the data


re-engineering workflow. Remember, Phase Two is all about
understanding the nature and content of the source data. Following
the four-phase process discussed in Chapter 2 “The Workflow for
Creating Re-engineered Data” will streamline your data
re-engineering implementation.

Figure 9-1 Phase Two: Understand the Nature and Content of The
Source Data

QualityStage Designer User Guide 9-1


9 DEFINING INVESTIGATE STAGES
Using an Investigate Stage

Investigating the source data helps you to understand the quality of


the source data and determine business rules at the data level that
you can use in designing your re-engineering application.
QualityStage includes the Investigate stage for assessing the content
of the source data. This stage organizes, parses, classifies, and
analyzes patterns in the source data, and it operates on both
single-domain data fields as well as free-form text fields.
This chapter explains how to use the Investigate stage and assumes
that you have already prepared and specified the source data files as
described in Chapter 2, “The Workflow for Creating Re-engineered
Data” and Chapter 6, “Building Jobs”.

Using an Investigate Stage


The Investigate stage looks at each record, field by field, analyzing the
data content of the fields you specify. When you use the Investigate
stage on free-form text fields, it parses them into individual tokens.
Using the Investigate stage, you perform four basic steps:

1. Depending on the source data and what you’re trying to find, you
choose the type of investigation to perform:
• Word investigation, used on free-form fields.
• Character investigation, used on single-domain fields.
2. Specify the fields you want to investigate. Be sure these fields
were defined appropriately when you prepared the data for
QualityStage.
3. Choose the rule set to use in classifying tokens or words.
4. Run the investigation on each field.
The Investigate stage provides the following sets of reports:
• Pattern—contains the pattern analysis of the data entities.
• Word Frequency—shows the frequency distribution of the field
values.
• Word Classification.

9-2 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

You use these reports:


• To evaluate the results
• To develop the next stage of your re-engineering process

Creating an Investigate Stage


To create an Investigate stage:
1. On the left pane of the QualityStage main window, select a Stages
folder.
2. Do one of the following:

• From the Toolbar, choose ➤ Stage ➤ Investigate.


• Right-click anywhere on the right pane, click New Stage, and
then click Investigate.

QualityStage Designer User Guide 9-3


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

The Investigate Stage wizard appears.

3. Under Name, enter up to eight alphanumeric characters.


4. Under Description, enter up to 40 characters.
5. Under Options, select one of the following:
• Character Discrete Investigation
• Character Concatenate Investigation
• Word Investigation
6. Under Data File, select the input file.
7. Click Next.

9-4 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

Depending on whether you are doing a character or a word


investigation determines the next dialog box.

Using Character Investigation


Character Investigation allows you to explore a single-domain field
(one that contains one data element or token, such as Social Security
number, telephone number, date, or ZIP code) to analyze and classify
data. The result of a Character Investigation provides a frequency
distribution and pattern analysis of the tokens.
You can investigate more than one single-domain field with a single
Investigate stage. You have the option of investigating multiple fields
individually (referred to as Character Discrete) or integrated as one
unit of data (referred to as Character Concatenate).

Using the Pattern Reports


The Investigation process generates two Pattern reports, which
display the unique patterns and frequency count as files. The files are
named using the job name and appending appropriate indicators. The
filenames for these reports are the first seven characters of the job
name with the following appended to it:

p.FRQ The Pattern report in ascending order by pattern.


p.SRT The Pattern report in descending order by frequency
count.

In addition, the process generates a file with the name job.FRQ, which
displays the tokens and patterns for all records.This file is different
for the two options for Character investigation.
• For the CONCATENATE option, the first column contains the
frequency count, followed by frequency percentage, the pattern,
and the entire fields.
• For the DISCRETE option, the first column contains the field
name, followed by the frequency count, followed by the frequency
percentage, the pattern, and the entire fields.

QualityStage Designer User Guide 9-5


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

A row with an [X] indicates the beginning of a new pattern group.

A partial example of the Pattern report from a Character


CONCATENATE investigation of ZIP Code in pattern order:

Frequency Count Percent Token Field


00000001 0.481% 01923 [X] | 01923
00000001 0.481% 22903 [X] | 22903
00000001 0.481% 27636 [X] | 27636
00000001 0.481% 32501 [X] | 32501
00000001 0.481% 48197 [X] | 48197
00000001 0.481% 53818 [X] | 53818
00000002 0.962% 63005 [X] | 63005

By default, QualityStage provides one sample for each unique pattern


in the Pattern report. You can increase the number of samples
displayed for each unique pattern; for example, you might want to see
four samples for each pattern.
You can also limit the frequencies that are displayed. You might not
want to see the low frequency patterns, the ones that appear only once
or twice. You can set a cutoff count for the frequency. You can change
either or both of these default settings through the Advanced Options
button.

Using Discrete Investigation


The Character Discrete option allows you to investigate multiple
single-domain fields individually with one job. Each field is treated as
a separate token for frequency count and pattern analysis. The
Pattern reports group the unique patterns of each field together. For
example, if you were doing an investigation on state and ZIP Code
fields, the Pattern reports would display all state tokens followed by
all ZIP Code tokens. This option allows you to investigate a large
number of fields with little effort.

9-6 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

Using Concatenate Investigation


The Character Concatenate option allows you to perform cross-field
correlations between multiple fields to determine relationships. With
this option, you select two or more fields from anywhere in the record
(that is, the fields do not have to be contiguous) to be investigated as a
single data field. To create the pattern analysis, the tokens are
concatenated with no spaces between the tokens. When the fields are
displayed as sample, a space is inserted between each field.

Using the Field Mask


You specify whether all or some of the characters in the field are
displayed in the reports as either the data value or their type with
a for alpha, n for numeric, or b for blank. You do this with a field mask
in which you specify whether each character is to be displayed as a
character (C), a type (T), or skipped (X). When a character is skipped,
it is not included in the frequency count or the pattern analysis, but it
is displayed as part of the sample.
You use the C field mask when you want to inspect the actual values
in your fields to make sure there is no false data in a field; for
example, 99999 for a ZIP Code or 111111111 for a Social Security
number. You use the T field mask when you want to inspect the type
of data in a character position; for example, with telephone numbers
as nnn-nnn-nnnn or (nnn)nnn-nnnn.
You use the X field mask when you only want to include the data from
the field in the sample but not as a token or part of the token for
investigation. For example, you want to investigate the first two
characters of a ZIP Code to determine the frequency distribution
based on state. You would set the field mask for the ZIP code to
CCXXX. In the fourth column of the pattern reports, you would only see
the first two characters. The frequency count is based on the number
of records in the file that start with the first two characters of the ZIP
Code. In the last column, you would see all five characters of the ZIP
Code in the sample representation; for example:

QualityStage Designer User Guide 9-7


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

ZIP 00000026 0.229% 02 [X] | 02914


ZIP 00001092 9.639% 40 [X] | 40351
ZIP 00000719 6.347% 46 [X] | 46202
ZIP 00000542 4.784% 51 [X] | 51301
ZIP 00001986 17.530% 60 [X] | 60634
ZIP 00000364 3.213% 87 [X] | 87105
You can also use the X field mask with the Character Concatenate
option to specify one or more fields to be displayed as part of the
sample only. Using the previous example, you could also select the
state fields setting the field mask to X for all characters. The Pattern
report displays the frequency counts for the first two characters of the
ZIP Code and the full five characters of the ZIP Code along with the
state in the sample column; for example:
ZIP 00000026 0.229% 02 [X] | 02914 RI
ZIP 00001092 9.639% 40 [X] | 40351 KY
ZIP 00000719 6.347% 46 [X] | 46202 IN
ZIP 00000542 4.784% 51 [X] | 51301 IA
ZIP 00001986 17.530% 60 [X] | 60634 IL
ZIP 00000364 3.213% 87 [X] | 87105 NM

9-8 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

Creating a Character Investigate Stage


If you select either Character Discrete or Character Concatenate
Investigation, the Character Mode dialog box appears.

1. Under Available Fields, select the first field to investigate.


2. Click Add To Selected Fields.

QualityStage Designer User Guide 9-9


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

The Mask Field Selection dialog box appears.

3. To set all characters in the field to one investigation type (type,


the actual character, or no investigation), click the appropriate
button (either All T, All C, or All X).
To set the mask for individual characters in the field, continue to
click on that character until you have the desired mask.
4. Click OK.
5. Repeat steps 1 through 4 for additional fields to be investigated.
6. To change the number of samples or the frequency cutoff, click
Advanced Options.
The Advanced Options dialog box appears.

7. Make the appropriate changes, and then click OK.

9-10 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

8. Click Finish.
To modify the mask field and change the investigation type for any
selected field:

1. Under Selected Fields, select the field for which you want to
modify the investigation type.
2. Click Change Mask.
The Mask Field Selection dialog box appears.
3. Make the appropriate changes, and then click OK.

Using Word Investigation


Word Investigation parses free-form data fields into individual tokens,
which are analyzed to create patterns. In addition, Word Investigation
provides frequency counts on the tokens. To create the patterns, Word
Investigation uses a set of rules for classifying personal names,
business names, and addresses.
You have several options on how you want tokens to be evaluated by
the investigation process, such as determining what appears in
frequency reports. You also have the option of standardizing the
samples. If you select this, Investigate uses Standardize to
standardize the samples. See Chapter 10, “Defining Standardize
Stages”, for a detailed discussion of the standardization process.
Word Investigation generates three sets of reports:
• Pattern.
• Word Frequency.
• Word Classification.
In addition, the process generates a file with the name job.PAT, which
displays the tokens and patterns for all records in the file.

QualityStage Designer User Guide 9-11


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

Using Rule Sets


Word Investigation provides pre-built rule sets for investigating
patterns on names and postal addresses:
• NAME (for individual and organization names).
• ADDR (for street and mailing addresses).
• AREA (for city, state, ZIP code, province, locale and so on).
The rule sets are defined for a specific country. The rule set for a
country is preceded by a two-character identifier; USNAME, for
instance.
When you specify a Word Investigation, you select the rule by which
you want your fields investigated. You can select only one rule set for
a Word Investigation stage, but you can select multiple fields to be
investigated.
The Word Investigation process parses the free-form data field into
individual elements or tokens, which is a word, a number, or a
mixture separated by one or more spaces or special characters. The
process compares each token with classified tokens in the
Classification table for that rule set.
If the token matches the word in the Classification table, Investigate
assigns the class for that token to represent it in the pattern. For
tokens that do not match any classified token, Investigate examines
the pattern and assigns classes as shown in the following table.

Class Description
^ Numeric containing all digits, such as 1234
? Unknown token containing one or more words, such as
CHERRY HILL
> Leading numeric containing numbers followed by one or more
letters, such as 123A
< Leading alpha containing letters followed by one or more
numbers, such as A3
@ Complex mix containing an alpha and numeric characters that
do not fit into either of the above classes, such as: 123A45 and
ABC345TR

9-12 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

Class Description
0 Null
- Hyphen
/ Slash
& Ampersand
# Number sign
( Left parenthesis
) Right parenthesis
~ Special containing special characters that are not generally
found in addresses, such as !, \, @, ~, %, etc.

Using Pattern Reports


Pattern reports display the unique patterns and frequency count in
two files. The files are named using the first seven characters of the
job name and appending appropriate indicators. The filenames for
these reports are the job name with the following appended to it:

p.FRQ The Pattern report in ascending order by pattern.


p.SRT The Pattern report in descending order by frequency
count.
The Pattern Report for Word Investigation is different from the same
named report for Character Investigation, including providing more
columns.
The Pattern Report displays:
• In the first column, the frequency count.
• In the second column, the frequency as a percentage of the number
of records in the file.
• In the third column, the pattern for the field.
• In the fourth column, a [X] for the beginning of each sample set.
• In the fifth column, the entire field.

QualityStage Designer User Guide 9-13


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

• If you requested, the standardized field is appended at the end of


the line.

A partial example of the Pattern report sorted by frequency count for a


Word Investigation on an address field with one sample, which are
standardized, for each pattern is:

Frequency
Count Percent Pattern Field Standardized Fields
00000051 24.519% ^?T [X}| 15423 COUSTEAU DR | 15423 COUSTEAU
00000037 17.788% ^? [X}| 6806 ROCKLEDGE | 6806 ROCKLEDGE
COVE COV
00000027 12.981% ^D>T [X}| 8541 W 72ND STREET | 8541 W 72ND
00000010 4.808% ^D?T [X}| 3625 SE HOWARD | 3625 SE HOWARD
DRIVE
00000010 4.808% ^D? [X}| 1304 N MAIN | 1304 N MAIN
00000009 4.327% ^D>S [X}| 4405 W 128TH ST | 4405 W 128th
00000006 2.885% ^D> [X}| 1537 E 37TH | 1537 E 37TH

Note that the vertical lines (|) are used to separate the field from the
pattern and from the standardized presentation. If you do not select
the standardize option, the standardized fields are not present.

Using Word Frequency Reports


The Word Frequency reports display the unique tokens and frequency
count in two files. Unless you indicate otherwise, only the classified
tokens (that is, tokens listed in the Classification table) are included.

9-14 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

The filenames for these reports are the first seven characters of the job
name with the following appended to it:
a.FRQ The Word Frequency report in ascending order by
token.
c.FRQ The Word Frequency report in descending order by
frequency count.
You have the option of including unclassified tokens along with the
classified tokens in these reports with the Advanced Options dialog
box. When you include the unclassified tokens, you generate a report
of all tokens in your input file.

The Word Frequency report displays:


• In the first column, the frequency count.
• In the second column, the token.
A partial example of the Word Frequency report sorted by frequency
count for a Word Investigation on an address field is:

Frequency
Count Token
0000000030 W
0000000018 ST
0000000017 DR
0000000015 DRIVE
0000000015 STREET
0000000013 BOX
0000000011 E
0000000009 COURT
0000000008 TERR

The Word Frequency report assists you in reviewing the quality and
content of your data. When sorted by frequency, the reports allows you
to determine quickly the values present in your data. When sorted by
token, the report assists in identifying alternate representations, such
as misspellings, of your data.

QualityStage Designer User Guide 9-15


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

Using Word Classification Reports


Word Classification reports display the tokens in two files. One Word
Classification report displays only the classified tokens along with
their class. The other report displays the unclassified tokens in the
same file format as the Classification table.
The filenames for these reports are the first seven characters of the job
name with the following appended to it:

u.DLT The Word Classification report displaying only the


classified tokens in ascending order by token.

n.DLT The Word Classification report displaying only the


unclassified tokens in alphanumeric order by token.

Note: To generate the unclassified token report (the second file listed
above), you must specify Include Unclassified Alphas in Word
Frequency Files in the Advanced Options dialog box.

The Word Classification report of classified tokens displays all tokens


in your data file that are also listed in the Classification table for the
rule set you selected. The report is in alphanumeric order by the
token. This report displays:
• In the first column, the frequency count.
• In the second column, the token.
• In the third column, the class as assigned in the Classification
table.

9-16 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

A partial example of the Word Classification report of classified tokens


for a Word Investigation on an address field is:

Frequency Count Token Class


0000000003 APT M
0000000007 AVE T
0000000002 BEND G
0000000005 BLVD T
0000000013 BOX B
0000000004 COLL E
0000000014 CT T
0000000001 CTR E
0000000032 DR T
0000000015 E D

The Word Classification report of unclassified tokens displays all


tokens that are not included in the Classification table for the rule set
you specified. This report is in alphanumeric order by the token. This
report has the same file format as the Classification table. This report
displays:
• In the first column, the token.
• In the second column, the standardization for the token (which for
this report are the same as the token).
• In the third column, a question mark (?) as a place holder for the
class.
• In the fourth column, the frequency count preceded by a semi-colon
(;), which indicates the beginning of a comment.

QualityStage Designer User Guide 9-17


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

A partial example of the Word Classification report of unclassified


tokens for a Word Investigation of an address field is:

Frequency
Token Standardization Class Count
ALDEN ALDEN ? ;0000000001
ALVAMAR ALVAMAR ? ;0000000001
AMESBURG AMESBURG ? ;0000000001
ANN ANN ? ;0000000002
ANTIOCH ANTIOCH ? ;0000000001
ARLINGTON ARLINGTON ? ;0000000002
ARROWHEAD ARROWHEAD ? ;0000000001
AVON AVON ? ;0000000001
B B ? ;0000000001
BAKER BAKER ? ;0000000001
BARTON BARTON ? ;0000000001

The format of this report allows you to merge entries with exiting
Classification tables (.CLS files). You can use this report to fine-tune
your rule sets for investigating and standardizing data by adding to or
creating a new Classification Table. See Appendix E, “Customizing
and Testing Rule Sets”, for details on customizing your rule sets.

Specifying Advanced Options


When you configure an Investigate stage, you can control how and
how many tokens appear in the reports. You can also specify record
delimiters and to standardize your Pattern report samples.
The options you have for controlling how tokens appear in reports are:
• Treat Successive Unclassified Words As One Word.
This option strips out the spaces between unclassified words
(concatenating them into one word); for example, MARTIN LUTHER
KING becomes MARTINLUTHERKING. This option reduces the
number of patterns.
• Include Numbers in Word Frequency Files.
This option lists all number tokens in both Word reports. For
example, when investigating an address field, you probably do not

9-18 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

want to see house and apartment numbers, but you might want to
see numbers if you are investigating part numbers.
• Include Unclassified Alphas in Word Frequency Files.
This option includes all word tokens that are not in the
Classification Table in both Word reports. If you do not select this
option, the Word reports only include tokens from the
Classification Table.
• Include Mixed Types and Punctuated Words in Word Frequency
Files.
This option includes tokens with leading or trailing numerics,
such as 109TH and 42ND, in both Word reports.

You can select to display your tokens in the reports in one of the
following forms:
• Standard Abbreviation — the standardized representation of the
token from the Classification table.
• Original Spelling — the form as the token appears in the data file.
• Correct Spelling — allows the Investigation process to correct any
misspellings if the Classification table has a weight assigned to the
token.
By default, QualityStage provides one sample for each unique pattern
in the Pattern report. You can increase the number of samples
displayed for each unique token; for example, you might want to see
four samples for each token.
You can also limit the frequencies that are displayed. You might not
want to see the low frequency patterns, the ones that appear only once
or twice. You can set a cutoff count for the frequency. You can change
either or both of these default settings through the Advanced Options
button.

You can also specify what characters separate tokens and whether
special characters are included as a token with the following options:
• Separator List.
This list includes all special characters that separate tokens.
• Strip List.

QualityStage Designer User Guide 9-19


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

This lists includes all special characters from the Separator List
that are not to be a token. For example, the pound sign (#) by
default is not part of this list; therefore, APT#3A is three tokens:
APT, #, and 3A.
You can edit these lists to add or remove special characters. Note that
the space special character is included in both lists.
In addition, you can standardize the samples generated for the
Pattern report with the Standardize Representative Records option.
When you select this option, the Investigate stage invokes the
Standardize stage. Note that you need to have a Pattern-Action file
and a Dictionary file in your rule set. See Chapter 10, “Defining
Standardize Stages”, and Appendix C, “Defining Investigate Stages”,
for details.

9-20 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

Creating a Word Investigation Stage


If you select Word Investigation, the Word Mode dialog box appears.

1. Next to Available Rule Sets, select a rule set.


2. Under Available Fields, select the field that this rule set is to

standardize, and then click .


The field appears under Standard Fields.
3. Repeat step 2 for additional fields to be investigated.

Important: You must order the fields in the Standard Fields list in an
order used by the rule set. For example, for the Place rule set,
the field order must be: city, state, ZIP; and for the Names

QualityStage Designer User Guide 9-21


9 DEFINING INVESTIGATE STAGES
Creating an Investigate Stage

rule set, the order must be: first name, middle, last name,
suffix.

4. When all desired fields for the rule set are listed, click Add Rule.
The rule set and fields to be standardized appear under Scheduled
Processes.
5. To change the investigation options, click Advanced Options.
The Advanced Options dialog box appears.

6. Make the appropriate changes, and then click OK.


See “Using Word Investigation” on page 9-11 for a detailed
description on these options.

Note: To reset Separator List and the Strip List to the supplied
special characters, click Restore Defaults.

7. Click Finish.

9-22 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Running Investigate Jobs

Running Investigate Jobs


Add the stage Before you can use your Investigate stage, you must add it to a job.
to a job You can either:
• Add your Investigate stage to an existing job, or
• Create a new job, and then add your Investigate stage to it.
For information about creating new jobs, see “Creating a New Job” on
page 6-4. For information about adding stages to jobs, see “Adding
Existing Stages to a Job” on page 6-6.
After you add your Investigate stage to a job, you can run it.

Setting up If this is the first job that you run in the project, you must set up the
the file structure file structure for the project. See Chapter 7, “Deploying Jobs”, for
instructions.
Once your data files are available to QualityStage, run the Investigate
stage:

How to run 1. On the left pane of the QualityStage main window, select a Jobs
the stage folder.
2. From the jobs list on the right pane, select the job you want to run.
3. Do one of the following:

• On the Toolbar, click .


• Right-click on the job you want to run, and then click Run.

QualityStage Designer User Guide 9-23


9 DEFINING INVESTIGATE STAGES
Running Investigate Jobs

The Job Run Options dialog box appears. It looks like this:

4. Select a run profile from the Profile list.


If you have not defined a run profile, click Setup. For information
about how to set up run profiles, see Chapter 5, “Setting Up Run
Profiles”.
5. Under Select Run Options, select Deploy, Run, or both.
See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.
6. (Optional) If you are running a Word Investigate job, select one or
more of the following reports:
• Pattern Report
• Word Frequency Report
• Word Classification Report

9-24 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Running Investigate Jobs

7. (Optional) If you want to run a QualityStage formatted report


after you run the job, do the following:
a. Select Prepare Report Data.
This specifies that prepared report data output will be put in
the Data directory for the project.
b. Select Retrieve Report Data, and specify the maximum file
size to retrieve.
The output file will be copied to the location specified in the
run profile for local report data.
See Chapter 15, “Working with QualityStage Reports”, for more
information about preparing data for formatted reports.
8. (Optional) Click Advanced Run Options to see other options you
can set, depending upon:
• The job you are running
• The profile you are using
Select the options you need. For information about the advanced
run options, see “Advanced Run Options” on page 8-3.

Run mode 9. Click one of the following:


• Execute File Mode
• Execute Parallel Extender Mode
See “About Run Modes” on page 7-2 for more information about
run modes.

Note: If you intend to run, in Parallel Extender mode, a project


built with an earlier version of QualityStage (or
INTEGRITY), you must deploy the project using Parallel
Extender mode before you can run it.

QualityStage Designer User Guide 9-25


9 DEFINING INVESTIGATE STAGES
Running Investigate Jobs

Running in Parallel Extender Mode


After you click Execute Parallel Extender Mode, QualityStage runs
the job.
If you selected Wait for Completion, or if you are using a local
Windows server, status messages appear in the Status window.
When the job finishes running, a message like this one appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

9-26 QualityStage Designer User Guide


DEFINING INVESTIGATE STAGES
Running Investigate Jobs

Running in File Mode


After you click Execute File Mode, the File Mode Execution screen
appears:

The File Mode Execution dialog box lists all the stages in the job. You
must run the same stages that you selected when deploying the job.
See “Deploying Jobs in File Mode” on page 7-7 for more information.
By default, all stages listed are run from first to last. However, you
can select a subset of stages to run. For information about how to
select a subset, see “To select a subset” on page 8-7.

To run the job 1. Click Run From Start to End.


The progress of the run is noted in the Status box.

QualityStage Designer User Guide 9-27


9 DEFINING INVESTIGATE STAGES
Running Investigate Jobs

When the run has successfully finished, a message like this one
appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

9-28 QualityStage Designer User Guide


10

Defining Standardize Stages

Conditioning data is the first step in Phase Three of the data


re-engineering workflow. Remember, Phase Three is about the design
and development of the data re-engineering application. If you follow
the four-phase process discussed in Chapter 2, “The Workflow for
Creating Re-engineered Data”, you will streamline your data
re-engineering implementation.

Figure 10-1 Where you are in Phase Three: Design and Develop the
Data Re-engineering Application

Conditioning the input data ensures that each type of data has the
same type of content and format— that it is internally consistent.
Conditioned data is also called standardized data. Standardized data
is important for:
• Effectively matching data (step two)

QualityStage Designer User Guide 10-1


10 DEFINING STANDARDIZE STAGES
Using the Standardize Stage

• Facilitating a consistent format for the output data (step three)

Free-form fields Free-form fields can contain any alphanumeric information of any
length that is less than or equal to the maximum field length defined
for that field.
For example, an address field might contain address data that
includes numbers, letters, and special characters, such as 53 Main St.
#301, or 1416 West Road.

Fixed-formatted Fixed-formatted fields, on the other hand, contain only one specific
fields type of information such as only numeric, only character, or only
alphanumeric, and that has a specific format.
For example, a date of birth field such as 01/29/55 or a social security
number field such as 123-33-1234, both include numbers and special
symbols that appear in a specific format.
The Standardize stage parses both field types into single-domain
fields. This creates a consistent representation of the input data,
corrects any misspellings, and incorporates business and industry
standards.
This chapter explains:
• How to use the Standardize stage included with QualityStage
• How to create your own Standardize stages
It assumes that you have already prepared and specified the input
data files as described in Chapter 9, “Defining Investigate Stages”.

Using the Standardize Stage


Standardize uses the content and placement of the data within the
context of a record to determine the meaning of each data element. It
includes stages for standardizing information in data fields such as
name, address, city, state, and ZIP code. The output files from a
Standardize stage can be used by the Match stage or by other stages to
create your own jobs.

10-2 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Using the Standardize Stage

To correctly parse and identify each element or token, and place them
in the appropriate field in the output file, Standardize uses rule sets
that are designed to meet the name (individual and business) and
address conventions of a specific country. To see a list of country rule
sets available with QualityStage, scroll down the list of Available Rule
Sets in the Standardize Wizard – Command definition dialog box.
Additionally, the Standardize rule sets can standardize the
representation of any data, and append additional information from
the input data, such as sex.
The Standardize rule sets are the same as those used in the
Investigation process. You can run these rules out of the box or
customize them to handle data challenges not covered by the standard
rule sets.
Using a Standardize stage requires that you:

1. Specify an input data file.


2. Decide which standardization job to use:
• Use the Standardize stage.
• Create your own.
3. Decide which rule set to use:
• Select a country rule set included with QualityStage to
standardize on country postal code requirements.
• Select a business-intelligence rule set.
• Create your own rule set to standardize nonaddress fields.
4. Run the Standardize stage on the data file.

QualityStage Designer User Guide 10-3


10 DEFINING STANDARDIZE STAGES
About Rule Sets

About Rule Sets


The rule sets provided for you in the Standardize stage fall into these
categories:

Category Description Number of Rule Sets


Domain For a specific country, identifies and assigns a data One for each country
Pre-Processor domain to the name, address, and area fields in each
record.
You can use the output from this file as the input to
the country-appropriate Domain-Specific rule sets.
Domain-Specific For a specific country, standardizes each data domain Three for each country
as follows:
• Name including individual names, organization
names, attention instructions, and secondary
names.
• Address including unit number, street name,
type, and directionals.
• Area including cities, states, and ZIP codes (in the
U.S., for example).
Creates consistent and industry-standard data storage
structures, and matching structures such as blocking
keys and primary match keys.
Validation For a specific country, standardizes and validates the Four, for U.S. only
format and value of common business data including:
• Phone Number
• Tax ID or Social Security Number
• eMail Address
• Date

Standardization Processing Flow for U.S. Records


The following diagram illustrates the Standardize stage processing
flow using Domain Pre-Processor and Domain-Specific rule sets to

10-4 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Domain Pre-Processor Rule Sets

standardize the records of a U.S. input file. The same workflow is


representative of other countries used with the Standardize stage.

Input File
U.S. Records

U.S. DOMAIN
Pre-Processor
Rule Set

Intermediate File

U.S. NAME U.S. ADDR U.S. AREA


Domain-Specific Domain-Specific Domain-Specific
Rule Set Rule Set Rule Set

Name Address Area


Data Data Data
Structures Structures Structures

Domain Pre-Processor Rule Sets


These rule sets evaluate the mixed-domain input from a file for a
specific country (for example, the U.S.). For Domain Pre-processor
rule sets following a naming convention that starts with a country

QualityStage Designer User Guide 10-5


10 DEFINING STANDARDIZE STAGES
Domain Pre-Processor Rule Sets

abbreviation and ends with prep (an abbreviation for pre-processor),


see the following table for examples:

Rule Set Name Country


USPREP United States
GBPREP Great Britain
CAPREP Canada (English speaking)

These rule sets do not perform standardization but parse the fields in
each record and filter each token into one of the appropriate Domain
-Specific column sets, which are Name, Area, or Address.

Why You Use the Domain Pre-Processor Rule Sets


Since input files are rarely domain-specific, these rule sets are critical
when preparing a file for standardization. Fields can contain data that
do not match their metadata description. Here is an example:

Metadata Label Data Content


Name 1 John Doe
Name 2 123 Main Street Apt. 456
Address 1 C/O Mary Doe
Address 2 Boston, MA 02111

where the domains are:

Domain Name Data Content


Name John Doe
Name C/O Mary Doe
Address 123 Main Street Apt. 456
Area Boston, MA 02111

10-6 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Domain Pre-Processor Rule Sets

In addition, other problems arise when:


• Information continues across multiple column sets
• More than one data domain is present within a single column set
For example:

Domain Name Data Content


Name 1 John Doe and Mary
Name 2 Doe 123 Main Str
Address 1 eet Apt. 456 Boston
Address 2 MA 02111

Where the domains are:

Domain Name Data Content


Name John Doe and Mary Doe
Address 123 Main Street Apt. 456
Area Boston, MA 02111

As the column sets and metadata labels do not necessarily provide


hard information about data content, preprocessing categorizes the
input data into domain-specific column sets: Name, Address, Area,
and Other.
Since the conventions used in name and address vary from one
country to the next, the Domain Pre-Processor is configured for a
single country.
See “Domain Pre-Processor Rule Sets” on page D-3 for more
information on the standardized data structures output from the
Domain-Pre-Processor rule sets.

QualityStage Designer User Guide 10-7


10 DEFINING STANDARDIZE STAGES
Domain Pre-Processor Rule Sets

Preparing the Input File for the Domain Pre-Processor


The Domain Pre-Processor rule sets do not assume a data domain
with a field position. Therefore, you must insert at least one metadata
delimiter for a field in your input record. It is strongly recommended
that you delimit every field or group of fields. The delimiter indicates
what kind of data you are expecting to find in the field based on one or
more of the following:
• Metadata descriptions.
• Investigation results.
• An informed guess.
The delimiter names are:

Delimiter name Description


ZQNAMEZQ Name delimiter
ZQADDRZQ Address delimiter
ZQAREAZQ Area delimiter

You can use up to six metadata delimited fields in a file.


You insert the literal using the Standardize Command Definition
dialog box as described in “Creating a Standardize Stage” on page
10-13.

Important: If you fail to enter at least one metadata delimiter for the input
record, you receive the following error message in the output file:

PRE-PROCESSOR ERROR - NO METADATA DELIMITERS WERE


SPECIFIED

10-8 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Domain-Specific Rule Sets

Domain-Specific Rule Sets


These rule sets evaluate the domain-specific input from a file for a
specific country (the U.S., for example). There are three
Domain-Specific rules sets for each country.

Rule Set Name Comments


NAME Individual and business names
ADDR Street name, number, unit, and other address
information
AREA City, state, region, and other locale information

The two-character country abbreviation is prefixed to the rule set.


Here are examples for the NAME rule set:

Rule Set Name Country


USNAME United States name
GBNAME Great Britain name
CANAME Canada name

See “Domain-Specific Rule Sets” on page D-10 for more information on


the standardized data structures output from the Domain-Specific
rule sets.

QualityStage Designer User Guide 10-9


10 DEFINING STANDARDIZE STAGES
Validation Rule Sets

Validation Rule Sets


These rule sets to validate the values and standardize the format of
common business data from a U.S. input file. The following Validation
rule sets are available:

Rule Set Name Comments


VDATE Dates that include day, month, and year
VEMAIL email addresses that have a user, domain, and top-level
qualifier
VPHONE U.S. phone numbers
VTAXID U.S. tax I.D.s or Social Security numbers

See “Validation Rule Sets” on page D-14, for more information on the
standardized data structures output from the Validation rule sets.

Standardized Results
At the end of a run, Standardize:
• Creates a fixed-format file and
• Adds the fields to the data file definition
Depending on the type of rule set, each field contains one data element
from the input file. There may be additional data such as a SOUNDEX
phonetic or NYSIIS codes. You can use any of the additional data for
blocking and matching fields with Match or other matching jobs.
Using QualityStage Designer, you have the following options:
• Appending the input record to the end of the standardized output
record
• Appending none of the input fields
• Appending selected input fields as defined in the input data file
definition

10-10 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Defining Standardize Files

Rules Overrides
The rule sets provided with QualityStage are designed to provide
optimum results. However, if the results are not satisfactory, you can
modify rule set behavior using rule override tables.
See Appendix E, “Customizing and Testing Rule Sets”, for information
about how to use override tables.

Defining Standardize Files


You define one input file and one results file for a Standardize stage.

Defining the Input File


The data in the input file can be in:
• Fixed-format fields
• Free-form fields
• A combination of both

Inserting Literals
If the input records do not include critical entries, you can insert the
required values as a literal, which will appear in the output file. You
insert the literal using the Standardize Command Definition dialog
box as described in “Creating a Standardize Stage” on page 10-13.
For example, the input records lack a state entry because all records
are for the state of Vermont. To include the state in the standardized
records, you would insert the literal VT between the city name and the
ZIP code.
If input records have an apartment number field containing only an
apartment number, you could insert a # (pound sign) literal between
the unit type and the unit value.

QualityStage Designer User Guide 10-11


10 DEFINING STANDARDIZE STAGES
Defining Standardize Files

Literals cannot contain any spaces and must be inserted between


fields. You cannot include two contiguous literals for a rule set.

Delimiter Literals
You must insert field delimiters using literals for the Domain
Pre-Processor rule sets. A delimiter literal does not appear in the
output record. Add a metadata delimiter literal in front of at least one
field.
The delimiters are:

Delimiter Description
ZQNAMEZQ Name delimiter
ZQADDRZQ Address delimiter
ZQAREAZQ Area delimiter

Important: We strongly suggest that you enter a delimiter for every field or
group of fields in a record.

Defining the Results File


You need to define a file so that it appears in the Results File list of
the Stage Definition Wizard. When you define this file using the
Datafile Wizard, you use only the Add a New Datafile dialog box and
specify a name and description for this file.

Important: Do not add any fields to this data file definition. Standardize adds
the appropriate fields when you run the job.

10-12 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

Creating a Standardize Stage


To create a Standardize stage:

1. On the left pane of the QualityStage main window, select a Stages


folder.
2. Do one of the following:

• From the Toolbar, choose ➤ Stage ➤ Standardize.


• Right-click anywhere on the right pane, click New Stage, and
then click Standardize.
The Standardize Stage wizard appears.

QualityStage Designer User Guide 10-13


10 DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

3. Under Name, enter up to eight alphanumeric characters. The first


seven characters must be unique across all jobs.
4. Under Description, enter up to 40 characters.
5. Under Options, choose one of the following:
• Append All
• No Append
• Custom Append
6. (Optional) If you want to generate a predefined QualityStage
report for this job, select the Map Input Fields for Report Use
check box.
For information about predefined QualityStage reports, see
“Using Stage Wizards to Prepare Data for Predefined
QualityStage Reports” on page 15-2.
7. Under Default Output Format, choose one of the following:
• UPPERCASE ALL
• Preserve the case
• lowercase all
• Capitalize Every Word
8. Under Data File, select the input file.
9. Under Results File, select the results file.
The Data File must be a different file from the Results File.
10. Click Next.

10-14 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

The Standardize Wizard – Command definition dialog box


appears:

Selecting Rule Sets, Fields, and Literals


Use the Standardize Wizard – Command definition dialog box to select
the rule set and the fields for that job. You can also specify:
• Delimiter literals to insert into the output file fields for Domain
Pre-Processor rule sets
• Processing instructions for Domain-specific Name rule sets

QualityStage Designer User Guide 10-15


10 DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

To define the Standardize stage:

1. Next to Available Rule Sets, select a rule set.


2. Under Available Fields, select the field that this rule set is to

standardize, and then click .


The field appears under Standard Fields.
3. Repeat step 1 and step 2 to add additional fields for the current
job.
4. If you are using a Domain Pre-Processor rule set, enter one or
more delimiter literals as described in “Delimiter Literals” on page
10-12.
If you are using any other type of rule set (Domain-Specific or
Validation), you can also add literals as described in “Inserting
Literals” on page 10-11.
To add a literal, in the Literal field enter a value without spaces,

and then click .


Note the following:
• A delimiter literal precedes the field that it is delimiting.
• The only special characters that you can use in a literal are:
# (pound sign)
% (percentage)
^ (caret)
& (ampersand)
< > (angle brackets)
/ (slash)

10-16 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

Here is an example of a USPREP rule set in which the ZQNAMEZQ


delimiter is specified for the Names field and ZQADDRZQ for the
Street Addresses field:

QualityStage Designer User Guide 10-17


10 DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

5. (Optional) If you are using the Domain-Specific Name rule set, you
can choose the following process options:

Process Selection Description


Process All as Individual All fields are standardized as
individual names
Process All as Organization All fields are standardized as
organization names
Process Undefined as Individual All undefined (unhandled) fields
are standardized as individual
names.
Process Undefined as Organization All undefined (unhandled) fields
are standardized as organization
names.

This option is useful if you know what types of names your input
file contains. For instance, if you know that your file mainly
contains organization names, specifying Process All as
Organization enhances performance by eliminating the processing
steps of determining the name’s type.
6. (Optional) If you want to specify special case formatting rules,
select the With Case Formatting check box.
7. When all desired fields for the rule set are listed, click Add Rule.
If you selected the With Case Formatting check box, the Case
Formatting Options dialog box appears. For details about
specifying case formatting, see “Specifying Case Formatting
Options” on page 10-22.

10-18 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

The rule set and fields to be standardized appear under Scheduled


Processes as shown here:

Note: The order the rule sets appear in a scheduled process is the
order in which Standardize processes the fields.
A rule set can be scheduled only once in a Standardize process; for
example, you can specify only one USNAME rule set, one USAREA
rule set, and one USADDR rule set.
8. For additional rule sets, repeat step 1 through step 7.
9. (Optional) If you want to modify the case formatting rules for a
scheduled process, select it and click Edit Case Formatting
Options. The Case Formatting Options dialog box appears (see
“Specifying Case Formatting Options” on page 10-22).

QualityStage Designer User Guide 10-19


10 DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

10. When all rule sets are defined, do one of the following:
a. Click Finish if you selected Append All or No Append.
b. Click Next:
• If you selected Custom Append, the Append Field
Selection dialog box appears (see “Using the Append Field
Selection Dialog Box” on page 10-20).
• If you selected Map Input Fields for Report Use, the Data
Selection for Reports dialog box appears (see “Using the
Data Selection for Reports Dialog Box” on page 10-21).

Using the Append Field Selection Dialog Box


The Append Field Selection dialog box looks like this:

1. Select the field to be appended to the output file and click Add to
Append Fields.

10-20 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

The name of the field appears under Append Fields.


2. Repeat step 1 for additional fields.
3. When all desired fields are selected, click Finish.

Using the Data Selection for Reports Dialog Box


The Data Selection for Reports Dialog Box looks like this:

1. Select the field to serve as the record key for reporting, and then
click .
The name of the field appears under RECKEY.
2. Repeat step 1 for up to six additional fields.
3. When all desired fields are selected, click Finish.

QualityStage Designer User Guide 10-21


10 DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

Specifying Case Formatting Options


If you select the With Case Formatting check box, the Case
Formatting Options dialog box appears when you click Add Rule. This
dialog box also appears when you click Edit Case Formatting Options.

You can apply specific case formatting rules to:


• All fields. When you click Apply to all fields, case formatting is
applied to all fields that Standardize generates in the output file for
the current scheduled process.
• Selected fields. When you click Apply to specific fields, you can
select the specific output fields you want to apply case formatting
to.

10-22 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Creating a Standardize Stage

Using Classification Tokens to Specify Fields for Case Formatting


You can select one or more classification tokens to specify certain
output field types you want to apply case formatting to.
The classification tokens in the list are the tokens in the .CLS file for
the rule set of the current scheduled process.
For details about classification tokens, see “Classification Table” on
page C-6.

Applying Case Formatting Rules


You can apply case formatting rules to:
• Output record only
• Appended input only
• Both output record and appended input

Case Formatting Rule Sets


You can select any of the following standard case formatting rule sets
to apply:

Name Description
UPPERALL UPPERCASE ALL
PRESERVE Preserve the case
LOWERALL lowercase all
UPPEREACHWORD Capitalize Every Word
CITY Case formatting of city names
ENGEN General rule for English: lowercasing of common
words plus capitalization of all others
NAMES Case formatting of personal names
TRADE Case formatting of companies and trademark names

You can also create your own customized case formatting rule sets.

QualityStage Designer User Guide 10-23


10 DEFINING STANDARDIZE STAGES
Running Standardize Jobs

For information about case formatting rule sets, see Chapter 1 of the
QualityStage Stages Reference Guide.

Running Standardize Jobs


Add the stage Before you can use your Standardize stage, you must add it to a job.
to a job You can either:
• Add your Standardize stage to an existing job, or
• Create a new job, and then add your Standardize stage to it.
For information about creating new jobs, see “Creating a New Job” on
page 6-4. For information about adding stages to jobs, see “Adding
Existing Stages to a Job” on page 6-6.
After adding your Standardize stage to a job, you can run it.

Setting up If this is the first job that you run in the project, you must set up the
the file structure file structure for the project. See Chapter 7, “Deploying Jobs”, for
instructions.
Once your data files are available to QualityStage, run the
Standardize stage:

How to run 1. On the left pane of the QualityStage main window, select a Jobs
the job folder.
2. From the jobs list on the right pane, select the job you want to run.
3. Do one of the following:

• On the Toolbar, click .


• Right-click the job you want to run, and then click Run.

10-24 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Running Standardize Jobs

The Job Run Options dialog box appears. It looks like this:

4. Select a run profile from the Profile list.


If you have not defined a run profile, click Setup. For information
about how to set up run profiles, see Chapter 5, “Setting Up Run
Profiles”.
5. Under Select Run Options, select Deploy, Run, or both.
See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.
6. (Optional) If you want to run a QualityStage formatted report
after you run the job, do the following:
a. Select Prepare Report Data.
This specifies that prepared report data output will be put in
the Data directory for the project.
b. Select Retrieve Report Data, and specify the maximum file
size to retrieve.

QualityStage Designer User Guide 10-25


10 DEFINING STANDARDIZE STAGES
Running Standardize Jobs

The output file will be copied to the location specified in the


run profile for local report data.
See Chapter 15, “Working with QualityStage Reports”, for more
information about preparing data for formatted reports.
7. (Optional) Click Advanced Run Options to see other options you
can set, depending upon:
• The job you are running
• The profile you are using
Select the options you need. For information about the advanced
run options, see “Advanced Run Options” on page 8-3.

Run mode 8. Click one of the following:


• Execute Data Stream Mode
• Execute File Mode
• Execute Parallel Extender Mode
See “About Run Modes” on page 7-2 for more information on the
three run modes.

Note: In Parallel Extender mode, if you intend to run a project built


with an earlier version of QualityStage (or INTEGRITY), you
must deploy the project using Parallel Extender mode before
you can run it.

Running in Data Stream Mode or Parallel Extender Mode


After you click either:
• Execute Data Stream Mode, or
• Execute Parallel Extender Mode,
QualityStage runs the job. .
If you selected Wait for Completion, or if you are using a local
Windows server, status messages appear in the Status window.

10-26 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Running Standardize Jobs

When the job finishes running, a message like this one appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

QualityStage Designer User Guide 10-27


10 DEFINING STANDARDIZE STAGES
Running Standardize Jobs

Running in File Mode


After you click Execute File Mode, the File Mode Execution screen
appears:

The File Mode Execution dialog box lists all the stages in the job. You
must run the same stages that you selected when you deployed the
job. See “Deploying Jobs in File Mode” on page 7-7 for more
information.
By default, all stages listed are run from first to last. However, you
can select a subset of stages to run. For information about how to
select a subset, see “To select a subset” on page 8-7.

1. Under Select Run Options:


a. Select Deploy, Run, or both.

10-28 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Standardizing a Multinational Address File Using Standardize

See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.
b. (Optional) Select Prepare Report Data.
This specifies that prepared report data output will be put in
the Data directory for the project.
c. (Optional) Select Retrieve Report Data, and specify the
maximum file size to retrieve.
The output file will be copied to the location specified in the
run profile for local report data.
See Chapter 15, “Working with QualityStage Reports”, for more
information about preparing data for formatted reports.

To run the job 2. Click Run From Start to End.


The progress of the run is noted in the Status box.
When the run has successfully finished, a message like this one
appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

Standardizing a Multinational Address File Using Standardize


If your input file contains multinational address data, we strongly
recommend that you use the Multinational Standardize stage rather
than the Standardize stage. Multinational Standardize creates a
single data structure that, in turn, creates one job stream for
standardization and matching.

QualityStage Designer User Guide 10-29


10 DEFINING STANDARDIZE STAGES
Standardizing a Multinational Address File Using Standardize

However, you can follow these steps to standardize a multinational


input file:

1. Run a Standardize job using the Country Identifier rule set, which
creates an intermediate file in which an ISO country code is
appended to each record.
2. Run a job using the Select stage to create an intermediate file that
contains records from a single country. The Select stage can use
the ISO country code field to create this file.
3. Run the Standardize job using the appropriate Domain
Pre-Processor rule set with the intermediate file.
4. Complete standardization of the file by running the Standardize
job using the appropriate Domain-Specific rule sets.

About the Country Identifier Rule Set


The Country Identifier rule set, named COUNTRY, prepares
multinational input files for standardization at the individual country
level. The rule set creates an output file in which the following fields
are appended to the beginning of each input record:
• A two-byte ISO country code. The code is associated with the
geographic origin of the record’s address and area information.
• An Identifier flag. The values are:

Flag Description
Y The rule set was able to identify the country.
N The rule set was not able to identify the country and used the default
value that you set as the default country delimiter.

After you create this output file, you can use a Select stage to create a
file comprising only one country, which can be used with a Domain
Pre-Processing rule set for the appropriate country.

10-30 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Standardizing a Multinational Address File Using Standardize

Using the Country Identifier Rule Set


The following diagram illustrates the processing sequence in which
you use the Country Identifier rule set in Standardize and a Select
Operation in a job to obtain a file that contains only United States
records for use in the Domain Pre-Processor rule set.

Input File

Standardize Job
Assign each record in the
Country Identifier file with a two-character
Rule Set ISO country code

Intermediate File

Input records
with ISO country code

Job Split input records into


United States and non-U.S.
Select Stage by selecting "US" in country
code field

ACCEPT REJECT

United States Non-U.S.


Input Records Input Records

Preparing the Input File for the Country Identifier


The Country Identifier rule set uses a default country delimiter when
the rule set cannot determine the record’s country of origin. This

QualityStage Designer User Guide 10-31


10 DEFINING STANDARDIZE STAGES
Managing the Rule Sets and Files

delimiter consist of a default country code, which you must define


before you run the rule in a job. You should use the country code that
you believe represents the majority of the records.
The delimiter name is:
ZQ<two-character ISO code>ZQ
For example, you use ZQUSZQ as a United States delimiter.
For a list of ISO codes, see Appendix F, “ISO Country Codes”.
This delimiter is inserted as a literal when you define the fields you
are using from your input file. See “Selecting Rule Sets, Fields, and
Literals” on page 10-15 for more information.

Important: If you fail to specify a country delimiter, the word ERROR appears
at the beginning of every line in each output record.

Managing the Rule Sets and Files


QualityStage provides the Rules Management dialog box, which
allows you to view and manage the rule sets and the files in the rule
sets. With this dialog box, you can:
• Create new rule sets.
• Copy existing rule sets.
• Rename rule sets.
• Delete rule sets.
Within a rule set, you can
• Edit rule set files.
• Copy files.
• Move files to other rule sets.
• Rename files.
• Delete files from rule sets.
When you rename a rule set, QualityStage renames the directory and
the .prc, .dct, .cls, and .pat files to the new name. When you delete a rule
set, QualityStage, deletes the directory and all files in the directory.

10-32 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Managing the Rule Sets and Files

Accessing the Rules Management Dialog Box


To access the Rules Management dialog box, choose
Rules ➤ Standardization Rules Management.
The files and tables appear in a tree structure under a folder for each
rule set. Here is how the Great Britain Area Domain-Specific rule set
and its files appear:

Note that the tree contains the rule set tables and files (CLS, PRC,
PAT, and DCT), and the Override tables.

Note: GBAREA also includes lookup tables.

QualityStage Designer User Guide 10-33


10 DEFINING STANDARDIZE STAGES
Managing the Rule Sets and Files

Viewing or Modifying Rule Set Files and Tables


You can view or modify the contents of a file or table in Notepad by
clicking on a file icon in the tree. For details on the contents of the rule
set files and tables, see Appendix C, “Rule Set Files”.

Important: We strongly recommend that you not edit the QualityStage


pre-built rule set files and tables. To alter rule set behavior you
should use the override table dialog boxes, which you access in
QualityStage Designer. Doing so will ensure that edits are
syntactically correct and can be easily ported when performing a
QualityStage (or INTEGRITY) upgrade. See Chapter 10
“Customizing and Testing Rule Sets”

Creating New Rule Sets

How to create a To create a new rule set:


new rule set
1. From the QualityStage main window, choose Rules ➤
Standardization Rules Management.

10-34 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Managing the Rule Sets and Files

The Rules Management dialog box appears.

2. Click New RuleSet.


The New RuleSet dialog box appears.

3. Enter a name for the rule set.

QualityStage Designer User Guide 10-35


10 DEFINING STANDARDIZE STAGES
Managing the Rule Sets and Files

This name also becomes the name of the directory where the files
reside.
4. Click OK.
QualityStage creates a directory in the rule set directory and adds
four empty files with the same name and the extensions: PRC,
PAT, DCT, and CLS.
5. Edit the rule set files to make the necessary changes.
Optionally, you can create a new rule set by copying an existing one.

How to copy To copy an existing rule set:


a rule set
1. Select the rule set to be copied.
2. Do one of the following:
• Click Copy.
• Click the right mouse button and select Copy RuleSet.
The Copy RuleSet dialog box appears.

3. Enter a name for the new rule set.


QualityStage creates a directory in the rule set directory and
copies all files from the existing rule set.
4. Edit the rule set files to make the necessary changes.

How to edit To edit the rule set files:


a rule set file
1. Expand the rule set and select a file.
2. Click Edit or double-click the selected file.
QualityStage opens the file with Notepad.

10-36 QualityStage Designer User Guide


DEFINING STANDARDIZE STAGES
Managing the Rule Sets and Files

Note: Do not use Microsoft Word or similar word processing tools to


create or edit these files.

When you copy a file, you create a copy of the file within its rule set
with a new name. When you move a file, you move it from one rule set
to another.

How to move a file To move a file:

1. Select the file to be moved.


2. Drag and drop it to the desired rule set.
To copy a file to another rule set, you must make a copy of it in its
current rule set, then move the copy to the destination rule set.
Optionally, you can rename the moved file.

Tip: To copy, rename, or delete a rule set or file, you can also
right-click the file, and then select the desired action from the
shortcut menu.

QualityStage Designer User Guide 10-37


10 DEFINING STANDARDIZE STAGES
Managing the Rule Sets and Files

10-38 QualityStage Designer User Guide


11

Defining Multinational
Standardize Stages

When working with files containing multinational addresses,


QualityStage provides you with the following tools:
• The Multinational Standardize stage to standardize address files
at city- and street-levels in one step.
• The QualityStage WAVES™ (Worldwide Address Verification and
Enhancement System) stage to standardize, correct, verify, and
enhance addresses against country-specific postal reference files in
one step. For information about QualityStage WAVES, see the
QualityStage WAVES Stage Guide.

Important: QualityStage WAVES requires an additional license and postal


files that you must purchase from Ascential Software
Corporation. Contact your Ascential Software representative for
information on upgrading to QualityStage WAVES.

This chapter describes how to use the Multinational Standardize


stage.

QualityStage Designer User Guide 11-1


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
The Multinational Standardize Stage

The Multinational Standardize Stage


Use the Multinational Standardize stage to standardize a
multinational address file in one step.
For example, your input file might contain address records from the
U.S., Canada, Latin America, and Europe. Using this stage, you can
standardize all the records without having to aggregate records from
each country into separate files and standardize them using
country-specific rule sets.
Multinational Standardize uses country names, abbreviations, or ISO
country codes in the input file to apply country-appropriate
standardization rules.
Unlike the Standardize stage for standardizing multinational
addresses (see “Run mode” on page 10-26), there is no need to use a
Country Identifier rule set. The one-step Multinational Standardize
stage eliminates multiple-output files, thus simplifying any
subsequent file-handling and processing.

Which Countries Can Be Standardized


Addresses from more than 200 counties can be standardized at the
city-level; of these, more than 50 can be standardized at the
street-level. For a complete list of available countries, see your
QualityStage Release Notes.

City-Level Standardization
For more than 200 countries, the job does the following:
• Separates street-level address information from city-level
information (if necessary)
• Assigns City, Locality, Province/State, Postal Code, and Postal
Code add-on (ZIP4) to separate fields
• Assigns ISO country codes (2 and 3 byte versions)

11-2 QualityStage Designer User Guide


DEFINING MULTINATIONAL STANDARDIZE STAGES
Which Countries Can Be Standardized

• Assigns City Name phonetics (enhanced NYSIIS and REVERSE


SOUNDEX) for matching incorrectly spelled city names.

Street-Level Standardization
For more than 50 countries, the job does the following in addition to
city-level standardization:
• Separates street information into discreet fields, including house
number, street name, and so on.
• Assigns floor and unit information to the Secondary Address
Information field.
• Assigns building, contact, and address type to their respective
fields.
• Assigns unhandled address information to a field.
• Assigns unprocessed street address information when no address
standardization rules are available to Unprocessed Address field.
For a complete description of the fields in the Multinational
Standardize output file, see “Multinational Standardize Output
Fields” on page 11-15.

Modifying Standardization Behavior


The rule sets provided with QualityStage are designed to provide
optimum results. However, if the results are not satisfactory, you can
modify standardization behavior using rule override tables.
See Appendix E, “Customizing and Testing Rule Sets”, for information
about how to use override tables.

QualityStage Designer User Guide 11-3


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
Input File Requirements and Recommendations

Input File Requirements and Recommendations

Requirements
Input files must be fixed-field, fixed-record-length data files. The total
line length of any input record can be no greater than 4096 columns;
the address data must occur within the first 3072 columns.
Each record must contain a country indicator, which may be the full
spelling, an abbreviation, or the 2- or 3-byte ISO country code (see “ISO
Country Codes” on page F-1). If the country indicator does not match
the expected country-level or street-level formats for the indicated
country, the data is not standardized and is output as unhandled. For
example, if the record identifier is U.S. and the address format is that
of France, the record is not standardized.

Recommendations
We strongly suggest that you use a preprocessor to remove any
nonaddress or noncontact data from the address fields. Any
information other than address information is not handled by the
standardization rules. This extraneous data should be removed from
the file before you run the job.
Addresses should include the following information:
• Street address
• City
• State or Province
• Postal code
• Country code or name (required)
See the next section, “Input Field Configuration”, on how this
information can be organized into fields in the input file.

11-4 QualityStage Designer User Guide


DEFINING MULTINATIONAL STANDARDIZE STAGES
Input Field Configuration

A record can contain contact information such as C/O, Attn.,


Department, and so on. This information is standardized and placed
in the Contact Information field of the output file.

Input Field Configuration


The fields in the input file must be configured in one of the following
ways:
• One to five Address fields. They must include all city, state, street,
postal code, and country information.
• One to five Address fields, one combined
City/State-Province/Postal Code field, and one Country field.
• One to five Address fields, one City field, one State/Province field,
one Postal Code field, and one Country field.
The Multinational Standardize Stage wizard lets you define your
input file configuration as you build your job. Use the configuration
that most closely matches your input file.

Creating a Multinational Standardize Stage


To create a Multinational Standardize stage:

1. On the left pane of the QualityStage main window, select a Stages


folder.
2. Do one of the following:

• From the Toolbar, choose ➤ Stage ➤ Multinational


Standardize.
• Right-click anywhere on the right pane, click New Stage, and
then click Multinational Standardize.

QualityStage Designer User Guide 11-5


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
Creating a Multinational Standardize Stage

The Multinational Standardize Stage Wizard appears.

3. Under Name, enter up to eight alphanumeric characters. The first


seven bytes must be unique across all jobs.
4. Under Description, enter up to 40 characters.
5. Under Data File, select the input file.
6. Under Results File, select the results file.
The Data File must be a different file from the Results File. See
“Selecting Rule Sets, Fields, and Literals” on page 10-15 for more
information.
7. Click Next.

11-6 QualityStage Designer User Guide


DEFINING MULTINATIONAL STANDARDIZE STAGES
Creating a Multinational Standardize Stage

The Multinational Standardize Wizard appears:

8. Under Input Entry Options, select the appropriate option, based


on the configuration of your input file (see “Input Field
Configuration” on page 11-5 for more information).

QualityStage Designer User Guide 11-7


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
Creating a Multinational Standardize Stage

The Wizard activates the required input fields based on your


selection:

Input Entry Option Required Fields


Combined Street Name (up to five; must contain a
City/State/Street/Postal Code country identifier)
Combined City/State /Postal Street Name (up to five)
Code City/State/Postal
Country
Separate Field Entry Street Name (up to five)
City
State/Province
Postal Code
Country

9. Select an Available Field, and then click to enter the data


into the corresponding field on the right.

Note: Use to unselect selected fields.

11-8 QualityStage Designer User Guide


DEFINING MULTINATIONAL STANDARDIZE STAGES
Running Multinational Standardize Jobs

Here is an example of a completed Wizard:

10. When you are done entering information, click Finish.


The output fields are created in the output file definition.

Running Multinational Standardize Jobs


Add the stage Before you can use your Multinational Standardize stage, you must
to a job add it to a job. You can either:
• Add your Multinational Standardize stage to an existing job, or
• Create a new job, and then add your Multinational Standardize
stage to it.

QualityStage Designer User Guide 11-9


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
Running Multinational Standardize Jobs

For information about creating new jobs, see “Creating a New Job” on
page 6-4. For information about adding stages to jobs, see “Adding
Existing Stages to a Job” on page 6-6.
After adding your Multinational Standardize stage to a job, you can
run it.

Setting up If this is the first job that you have run in the project, you need to set
the file structure up the file structure for the project. See Chapter 7, “Deploying Jobs”,
for instructions.
Once your data files are available to QualityStage, run the
Multinational Standardize stage:

How to run the job 1. On the left pane of the QualityStage main window, select a Jobs
folder.
2. From the jobs list on the right pane, select the job you want to run.
3. Do one of the following:

• On the Toolbar, click .


• Right-click on the job you want to run, and then click Run.

11-10 QualityStage Designer User Guide


DEFINING MULTINATIONAL STANDARDIZE STAGES
Running Multinational Standardize Jobs

The Job Run Options dialog box appears. It looks like this:

4. Select a run profile from the Profile list.


If you have not defined a run profile, click Setup. For information
about how to set up run profiles, see Chapter 5, “Setting Up Run
Profiles”.

Important: If your server is running on a UNIX platform, you must enter


a German ISO locale in the Alternate Locale field of your run
profile. For example, de_DE.ISO8859-1. See “Defining a
UNIX or Windows Run Profile” on page 5-10 for more
information.

5. Under Select Run Options, select Deploy, Run, or both.


See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.

QualityStage Designer User Guide 11-11


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
Running Multinational Standardize Jobs

6. Enter the Multinational Rules Location. The default is


\Ascential\QualityStageServer<version>\Apps\MNV\Controls

Use to browse for the location.


7. (Optional) Click Advanced Run Options to see other options you
can set, depending upon:
• The job you are running
• The profile you are using
Select the options you need. For information about the advanced
run options, see “Advanced Run Options” on page 8-3.

Run mode 8. Click one of the following:


• Execute Data Stream Mode
• Execute File Mode
• Execute Parallel Extender Mode
See “About Run Modes” on page 7-2 for more information on the
three run modes.

Note: In Parallel Extender mode, if you intend to run a project built


with an earlier version of QualityStage (or INTEGRITY), you
must deploy the project using Parallel Extender mode before
you can run it.

Running in Data Stream Mode or Parallel Extender Mode


After you click either:
• Execute Data Stream Mode, or
• Execute Parallel Extender Mode
QualityStage runs the job.
If you selected Wait for Completion, or if you are using a local
Windows server, status messages appear in the Status window.

11-12 QualityStage Designer User Guide


DEFINING MULTINATIONAL STANDARDIZE STAGES
Running Multinational Standardize Jobs

When the job finishes running, the following message appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

QualityStage Designer User Guide 11-13


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
Running Multinational Standardize Jobs

Running in File Mode


After you click Execute File Mode, the File Mode Execution screen
appears:

The File Mode Execution dialog box lists all the stages in the job. You
must run the same stages that you selected when you deployed the
job. See “Deploying Jobs in File Mode” on page 7-7 for more
information.
By default, all stages listed are run from first to last. However, you
can select a subset of stages to run. For information about how to
select a subset, see “To select a subset” on page 8-7.

1. Under Select Run Options, select Deploy, Run, or both.

11-14 QualityStage Designer User Guide


DEFINING MULTINATIONAL STANDARDIZE STAGES
Multinational Standardize Output Fields

See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.

To run the job 2. Click Run From Start to End.


The progress of the run is noted in the Status box.
3. When the run has successfully finished, a message like this one
appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13.
You can view the output file you defined in your job as described in
“Using the QualityStage Data File and Report Viewer” on page 16-1.
Note that you can use the QualityStage report as described in the
“Standardization WAVES/Multinational Report” on page 15-28.

Multinational Standardize Output Fields


The following table describes the fields in the Multinational
Standardize output file at both the city- and street-levels.

Start
Domain Field Length Position Comments
City-level City 28 1
Neighborhood/Locality 40 29
State/Province 3 69
Postal Code 10 72 ZIP for U.S.

QualityStage Designer User Guide 11-15


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
Multinational Standardize Output Fields

Start
Domain Field Length Position Comments
ZIP4 or Additional 4 82
Sorting/Routing
Information
ISO Country Code (alpha 2 86 See “ISO Country Codes” on page
2) F-1
ISO Country Code (alpha 3 88 See “ISO Country Codes” on page
3) F-1
City NYSIIS 8 91 NYSIIS phonetic spelling; may be
used in matching for
deduplication.
City RSNDX 4 99 Reverse SOUNDEX spelling; may
be used in matching for
deduplication.
reserved 1 103
Area Verification Indicator 1 104 Used in QualityStage WAVES
only.
Area Match Pass 1 105 Used in QualityStage WAVES
only.
Street-level House # 15 106 Includes prefix # and suffix #
Prefix Directional 3 121 Such as N for North, S for South,
etc.; language-specific for each
country.
Prefix Type 20 124 Includes highway and route
prefixes such as Rue de la Mer
where Rue is the prefix.
Street Name 35 144
Suffix Type 15 179 Such as Ave., Rd., etc.
language-specific for each country.
Suffix Directional 3 194 Such as N for North, S for South,
etc.; language-specific for country.

11-16 QualityStage Designer User Guide


DEFINING MULTINATIONAL STANDARDIZE STAGES
Multinational Standardize Output Fields

Start
Domain Field Length Position Comments
Box Type 15 197 Such as P.O. Box, Box, etc.;
language-specific for each country.
Box Value 10 212 Alphanumeric
Secondary Address Info 50 222 Unit, floor and multi-unit info.
Building Name 30 272 For example, Empire State Building,
Rockefeller Plaza, etc.
Contact Info 60 302 Includes attention, C/O and
department information.
Address Type Indicator. 2 362 S = Street only
B = Box info
L = Building info
G = General Delivery
Y = Secondary address info
O = Other
Unhandled Text 50 364 Any information that does not
conform to an expected address
output field.
reserved 10 414
reserved 10 424
Address Verification 1 434 Used in QualityStage WAVES
Indicator only.
Address Match Weight 7 435 Used in QualityStage WAVES
only.
Address Match Pass 1 442 Used in QualityStage WAVES
only.
reserved 6 443

QualityStage Designer User Guide 11-17


11 DEFINING MULTINATIONAL STANDARDIZE STAGES
Multinational Standardize Output Fields

Start
Domain Field Length Position Comments
Unhandled Address 150 449 Address data not standardized.
NYSIIS of Street Name 8 599 The phonetic spelling of the street
Root name root; for example, for Rue de
la Mer, the root is Mer. This
information can be using in
matching for deduplication
Appended Input file n 607 The original input file

About the Output File


Note the following about the Multinational Standardize output file:
• The total length of the standardized record is 606 bytes; the input
record is appended to the end of the record in byte 607.
• The output fields are automatically created in the output file
definition after you press the Finish button in the Multinational
Standardize wizard.
• The output is not in a format that can be used directly as a mailing
list; the file is in a parsed business intelligence format. Further
processing may be required to render the records into a “mailable”
format.

11-18 QualityStage Designer User Guide


12

Defining Match Stages

Matching data is the second step in Phase Three of the data


re-engineering workflow. Remember, Phase Three is about the design
and development of the data re-engineering application. Following the
four-phase process discussed in Chapter 2, “The Workflow for
Creating Re-engineered Data”, will streamline your data
re-engineering implementation.

Figure 12-1 Phase Three: Design and Develop the Data Re-engineering
Application

Matching data helps you identify duplicate entities within one or more
files, which you need to know later on, in Step Three. This step also
lets you establish cross-reference linkage and enrich existing data
with new attributes from external sources.

QualityStage Designer User Guide 12-1


12 DEFINING MATCH STAGES
About Matching Data

QualityStage includes the Match stage for defining match criteria.


This stage locates duplicate records by matching data in one or more
fields.
This chapter explains how to use the Match stage. For best results, it
is recommended that you first condition the source data files, as
described in Chapter 10, “Defining Standardize Stages”.

About Matching Data


Your data re-engineering assignment determines your matching
strategy. Once you know what you’re looking for, whether it’s to match
individuals, match companies, perform householding, or reconcile
inventory transactions, you can design a matching strategy to meet
these goals.
Matching is a two-step process: first you block records and then you
match them. Blocking identifies pairs of records that you want to
ignore during matching. When you use the Match stage, you define
what constitutes a match; you set which fields must match and how
similar they must be
The Match stage provides:
• Individual matching
• Geographic coding
• Many-to-one matching
• Unduplication services
• Single file grouping
The Match stage also provides:
• Data management
• Report generation
• Clerical review
• Probability analysis functions
The Match stage matches fixed-format files that are the result of a
Standardize stage. The results from a Match stage can be used for the

12-2 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using Match Stages

next stages of your data re-engineering project, such as Survive, or for


loading into a database.
For more information about using Match stages in an OS/390 MVS
environment, see the QualityStage OS/390 Server Guide.

Using Match Stages


Using the Match stage, you complete three steps:

1. Define your input files.


2. Specify how you want the input records to match. You can indicate
which fields are important, how to group records, and which fields
to use for weights and penalties.
• First you partition the data into subsets or blocks. This is
called blocking.
• QualityStage then associates a record on one file to a record on
another file or, in the case of unduplicating a file, on the same
file. This is called matching.
3. Define the output.
In addition, the Match stage provides functions that:
• Generate a statistical report on the results of a match run
• Generate a report listing the actual results of the match run,
• And, when all data is completely matched, extract the data to use
for the next stages of your data re-engineering project.
Match provides three basic matching processes:
• One-to-one matching.
• Many-to-one matching.
• Unduplications.
These three processes handle the common data issues you encounter
during a re-engineering application. The type of data you work with,
and your business rules, dictate which matching process is
appropriate for your needs.

QualityStage Designer User Guide 12-3


12 DEFINING MATCH STAGES
Using Match Stages

For the first two processes, the Match stage uses two files: FileA and
FileB. For the third process, the Match stage uses only one data file,
FileA.

One-To-One Matching
For the one-to-one matching process, the input files can be either
File A or File B. This matching process identifies all records on one file
that correspond to a record for the same individual, event, household,
street address, etc., on the second file. Only one record on FileB can
match a single record on FileA, because you are matching individual
events.

Many-To-One Matching
For the many-to-one matching process, multiple records on FileA can
match a single record on FileB. With these matching jobs, FileB is
considered a reference file. An example of many-to-one matching is
matching a transaction file to a master file, where you can have many
transactions for one person on the master file.
Another example of many-to-one matching processes is geographic
coding. These processes match a file containing street addresses either
to a Post Office ZIP code file to obtain ZIP codes, or to a Census Tiger
file to obtain latitude-longitude coordinates or census tract
information.
When matching, one or more fields on FileA must have equivalent
fields on FileB. For example, if you want to match on last name and
age, both FileA and FileB must have a field for last name and a field
for age. The location and length of the fields can be different in the two
files.

12-4 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using Match Stages

Matching for Unduplication


Unduplication services and single file grouping jobs locate all records
that apply to the same individual, household, event, etc. For these
jobs, you use only FileA.

Blocking Phase
The blocking phase limits the number of record pairs being examined,
increasing the efficiency of the matching. This phase creates a subset
or block of records that have a high probability of being associated
with or linked to other records during the matching phase. Blocking
identifies pairs of records that have a low probability of matching to be
ignored during the matching phase.
During the blocking phase, all records having the same value in the
blocking fields are eligible for comparison during the matching phase.
For example, if LAST_NAME is a blocking field, all persons with the
same last name on the two files are included in the block for the
matching phase. Records having different last names are not included.

The Strategy For Using Match Passes


Any records not included in the matching phase become residuals and
are the input files for the next blocking and matching pass. With
subsequent passes, you should specify different blocking fields.
You should select a blocking strategy that creates small blocks of
records, approximately 10 to 20 records per file. Blocks should not
exceed 100 records. One method to make sure that you are creating
small blocks is to use several fields for blocking in each pass.
Your strategy should also attempt to match the majority of records on
the first pass, so that you have fewer records to be processed in
subsequent passes. Therefore the first pass should be as restrictive as
possible, so select fields for blocking that have the most number of
values and the highest reliability.

QualityStage Designer User Guide 12-5


12 DEFINING MATCH STAGES
Using Match Stages

Errors in selecting blocking strategies are very common and can be


serious. The following example illustrates a blocking strategy that
uses multiple fields:
Two files have the following fields:
Last Name
Middle Initial
First Name
Sex
Birth Year
Birth Month
Birth Day
For the first pass, you block on Last Name, Sex, and Birth Year. This pass
creates blocks of people with the same last name and sex who were
born in the same year. Any errors in the data on one or more fields
(such as records that should match but did not or records that did
match and should not have) can be located in the second pass. With
the second pass, you block on Birth Month, Birth Day, first letter of the
First Name, and Middle Initial.
Make sure that the fields you pick for blocking are defined
appropriately for missing values; that is, how missing values are
represented, such as spaces, zeroes, all nines. Match skips blocks that
have fields with missing values. Also make sure that you have defined
how missing values are specified with each field in your files. (See
“Creating a Datafield Definition” on page 4-21 for more information.)

Matching Phase
After creating a block of records, the Match stage compares fields that
you specified as matching fields to determine the best match for a
record or, in the case of an unduplicating match, the master record
and associated duplicates. The Match stage provides over 20 types of
comparison, which are algorithms based on the type of data in the
fields, such as numeric data versus character strings or parts of street
addresses.
To determine whether a record is a match, the Match stage calculates
a weight for each comparison, according to the probability associated

12-6 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using Match Stages

with each field. The Match stage uses two probabilities for each field:
the m-probability and the u-probability.

About m-probability and u-probability


The m-probability is the probability that the field agrees given the
record pair is a match. The m-probability is one minus the error rate
of the field. That is, if a field in a sample of matched records disagrees
10% of the time the m-probability for this field is 1 – 0.1, or 0.9.
The u-probability is the probability that the field agrees given the
record pair is unmatched. The u-probability is the probability that the
field agrees at random.
For example, the probability that sex agrees at random is 50% of the
time. With a uniform distribution, you have four combinations in
which sex agrees in two of the four or a 0.5 u-probability:

FileA FileB
M M
M F
F M
F F

About Weights
For each matching field, the Match stage computes a weight. If the
comparison between a pair of fields agrees, the pair of fields receives
an agreement or positive weight, which is calculated as the
log2(m-probability/(u-probability). If the comparison disagrees, the
pair of fields receives a disagreement or negative weight, which is
calculated as the log2((1 – m-probability)/(1 – u-probability)).
The Match stage sums the weights assigned to each field comparison
and obtains a composite weight. The agreement weight of each field
adds to the composite weight, and the disagreement weight subtracts
from the composite weight; that is, the higher the composite weight,
the greater the agreement.

QualityStage Designer User Guide 12-7


12 DEFINING MATCH STAGES
Using Match Stages

About Cutoffs
The composite weights assigned to each record pair create a
distribution of scores that range from very high positive to very high
negative. Within the distribution of positive values, you want to define
a value or cutoff at which any record pair receiving a weight equal to
or greater than this cutoff is considered a match, and is referred to as
the match cutoff.
Conversely, you want to define a cutoff at which any record pair
receiving a weight equal to or less than this cutoff is considered a
non-match. Any record pairs with weights that fall between these two
cutoff values are considered clerical review cases, and is referred to as
the clerical cutoff.
If more than one record pair receives a composite weight higher than
the match cutoff weight, those records are declared duplicates. The
way in which duplicate records are handled is based on what type of
matching you selected (see “Defining a Match Stage” on page 12-11).
Any record pair that falls below the clerical cutoff becomes a residual
and is eligible for the next matching pass.

About Unduplication
When unduplicating a file, you are essentially grouping records that
share common attributes. You might unduplicate a file to group all
invoices for a customer or merge a mailing list.
With unduplication, the Match stage declares all records with weights
above the match cutoff as a set of duplicates. The Match stage then
identifies a master record by selecting the record within the set that
matches to itself with the highest weight. The master record is
associated with its set of duplicates.
Any records that are not part of a set of duplicates are declared
residuals. These and the master records are generally made available
for the next pass. Duplicates are not included in subsequent passes,
because you want them to belong to only one set.
When selecting fields for matching (including unduplication matches),
you want to include as many fields as possible. You should include all

12-8 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using Match Stages

fields in common on both files for the matching fields. Include fields
that are not very reliable and assign them low m-probabilities.
If you want the Match stage to do exact matching only, specify the
blocking fields and not the matching fields. This results in all record
pairs that agree on the blocking fields being declared matches or
duplicates.

Reviewing the Results


Match provides a statistical report and a report that displays the
results of your Match stages. You can request a statistical report of
each match pass that you run. Depending on the match process, this
report displays information about the match run and frequency
distributions, summary statistics, and histograms.
When you request a statistical report, you need to specify a base
filename for it, and the Match stage appends an underscore ( _ ), the
number of the pass for which the report is generated, and a.OUT. You
request this report with the Match Settings dialog box (see “Running
Match Jobs” on page 12-42 for more information).
You can request a report displaying the results of each match pass or
a select set of match passes. The Match stage generates a file
containing the matched, clerical, and duplicate records from the
specified match pass. You can customize the report layout to include
the residual records, to write the report to multiple files, and to group
records for unduplication runs.

Extracting Data
During the matching job, the Match stage stores the decisions made
on each record pair for each pass. If requested, the Match stage
creates output files of all matched records, of the clerical review
records, and the duplicate and residual records on both FileA and
FileB.

QualityStage Designer User Guide 12-9


12 DEFINING MATCH STAGES
Defining Match Files

You can customize what output files are created and what records are
written to those files. You can also customize unduplication matches
to create sets of groups.

Defining Match Files


You define two input files, FileA and FileB, unless you are
unduplicating a file. In that case you need to define only FileA. You
define output files only for Match reports and extracts that you are
customizing.

Defining Input Files


The data in the input files must be in fixed-format fields. You must
define all fields that you plan to use for blocking and matching, as well
as any fields that you want to appear in customized reports or to be
included in output files from running a customized extract.
Each field in the record must be fixed format. If your input files are
free-formatted, you can use either a Standardize stage or create a job
that uses other stages to standardize and format your files. Make sure
that you have defined for each field how the Match stage is to handle
missing values (see “Working with Data Files” on page 4-14 for
details).
If your input files are the result of running either a Standardize stage
or a job using other stages, you do not need to define new files or
modify the field definitions. Any field in the results file from a job can
be used for blocking and matching.

Defining Output Files


You need to define output files only when you customize a Match
report or an extract run to select the file from the Select Outputs list
box of the Report Wizard and Extract Wizard.

12-10 QualityStage Designer User Guide


DEFINING MATCH STAGES
Defining a Match Stage

When you define these files, you need to specify only a name and
description for this file using the Add a New Datafile dialog box.

Important: Do not add any fields to this data file definition. This file
definition does not require or use field definitions.

Defining a Match Stage


In defining a Match stage, you define each pass. For each pass, you
define one or more blocking variables and one or more matching
variables.

With the Match stage, you can perform the following types of
matching:
• Match.
Matches a record on FileA to only one record on FileB. Any other
records that match are considered duplicates.
• Match Sets.
Matches duplicate sets of records on FileA with duplicate sets of
records on FileB. For example, if you have two sets of records on
both FileA and FileB for John Doe, the Match stage generates two
matches: one for each matching John Doe.
• Geomatch.
Geographic matching (geocoding) or many-to-one matching, in
which each FileA record can match more than one FileB record.
This type of matching is similar to matching with the Unijoin
stage.

Note: When the number of input records is very small compared to the
number of records in the reference database, we suggest you run
the Geomatch in File Mode. Doing so improves performance.

• Geomatch Multiple.
Multiple records on FileB having the same weight as the matched
pair are flagged as duplicate records. For example, if 101 Main St.

QualityStage Designer User Guide 12-11


12 DEFINING MATCH STAGES
Creating a Match Stage

on FileA matches to two records on FileB: 101-199 Main St SW and


101-199 Main St SE. One FileB record is the matched record and the
other is the duplicate.
• Geomatch Duplicates.
Similar to the Geomatch Multiple option, except that additional
FileB records that match to a level above the duplicate cutoff
value are flagged as duplicates.
• Undup.
Locates duplicate records in FileA. This option allows grouping of
records, such as finding all members of a household or finding
duplicate customers, patients, sales, etc.
• Undup Independent.
Similar to the Undup option, except that all records including the
duplicates are made available to the next pass. With every pass
after the second pass, the Match stage groups duplicate records
from the two passes.
To define a Match stage:

1. Add a new job to your project.


2. Use the Match Stage wizard to name and define a new Match
stage.
3. Use the Select Match Build Method dialog box to specify whether
you want to use:
• A stage Template
• A custom match specification
• A Match spec

Creating a Match Stage


To create a Match stage.
1. On the left pane of the QualityStage main window, select a Stages
folder.

12-12 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

2. Do one of the following:

• From the Toolbar, choose ➤ Stage ➤ Match.


• Right-click anywhere on the right pane, click New Stage, and
then click Match.
The Match Stage wizard appears.

3. Under Name, enter up to eight alphanumeric characters.


4. Under Description, enter up to 40 characters.
5. Under Options, select one of the following:
• Match

QualityStage Designer User Guide 12-13


12 DEFINING MATCH STAGES
Creating a Match Stage

• Match Sets
• Geomatch
• Geomatch Multiple
• Geomatch Duplicates
• Undup
• Undup Independent
6. Under Data File A, select the file to be FileA.
7. If you selected either the Undup or the Undup Independent
option, click Next.
If you selected any other option, select the file to be FileB under
Data File B, and then click Next. Data File A must be a different
file from Data File B.
The Select Match Build Method dialog box appears.

Now decide whether you want to:


• Use a template,
• Create a custom match specification, or
• Select a Match Spec from the Match Spec Library.

12-14 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

Using a Template Match Specification


Use the Select Match Build Method dialog box to specify whether you
want to:
• Use a template for specifying match criteria,
• Create a custom match specification, or
• Use a match spec from the match spec library.
A template provides default match settings that you can use to
configure the criteria for a match job. Using a template, you can
quickly create a Match stage by answering a few questions about your
matching criteria and data.

Template Match Data Requirements


Before you define a default match, you must prepare your data. You
must standardize your data files with the USNAME, USADDR, and
USAREA rule sets. You also need to make sure that the fields used for
blocking and matching are filled with comparable amounts and types
of data. For example, if one of your files contains data in a field used
for matching and the other file does not, the match is affected.

Individual Customer Match


This match blocks on address elements, including housenumber, ZIP
code, and street name NYSIIS and SOUNDEX values. The match
comparisons match the name of an individual. This match is intended
to locate all of the records related to a single individual.

Derived Fields ZIP3 (first 3 chars of ZCUSARE - )


FIRST1 (first char of F1USNAM - )

QualityStage Designer User Guide 12-15


12 DEFINING MATCH STAGES
Creating a Match Stage

Blocking Fields ZIP3 (first 3 chars of ZCUSARE - )


ZCUSARE - ZipCode
SSUSADD - ReverseSoundexofStreet Name
HNUSADD - HouseNumber
FIRST1 (first char of F1USNAM - )
X1USNAM - Name1L1NYSIIS

Matching Fields F1USNAM - Name1FirstName


M1USNAM - Name1MiddleName
L1USNAM - Name1PrimaryName
S1USNAM - Name1Suffix
C1USNAM - Name1GenderCode
H1USNAM - Name1L1HashKey
SNUSADD - StreetName
UVUSADD - UnitValue
HNUSADD - HouseNumber
ZCUSARE - ZipCode

VarTypes HNUSADD - HouseNumber


C1USNAM - Name1GenderCode
S1USNAM - Name1Suffix

Individual File Unduplication Match


The following match is a template for unduplicating a single file. A
single file is matched against itself. The template blocks on portions of
first name and ZIP code. It matches on address data.

Derived Fields ZIP54 (first 4 chars ZCUSARE - )


E1STNM (first char of E1USNAM - )
Name1EnhancedFirstName

12-16 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

Blocking Fields X1USNAM - Name1L1NYSIIS


E1STNM (first char of E1USNAM - )
Name1EnhancedFirstName
C1USNAM - Name1GenderCode
ZIP54 (first 4 chars ZCUSARE - )
ATUSADD - AddressType
RVUSADD - RuralRouteValue
BVUSADD - BoxValue
NSUSADD - NYSIISofStreetName

Matching Fields E1USNAM - Name1EnhancedFirstName


M1USNAM - Name1MiddleName
HNUSADD - HouseNumber
HSUSADD - HouseNumberSuffix
PDUSADD - StreetPrefixDirectional
SNUSADD - StreetName
SDUSADD - StreetSuffixDirectional
BNUSADD - BuildingName
ZCUSARE - ZipCode
NCUSARE - CityNYSIIS
BVUSADD - BoxValue
G1USNAM - Name1Generation
C1USNAM - Name1GenderCode
L1USNAM - Name1PrimaryName
RVUSADD - RuralRouteValue
BTUSADD - BoxType

QualityStage Designer User Guide 12-17


12 DEFINING MATCH STAGES
Creating a Match Stage

VarTypes C1USNAM - Name1GenderCode


G1USNAM - Name1Generation
HNUSADD - HouseNumber
HSUSADD - HouseNumberSuffix
PDUSADD - StreetPrefixDirectional
SDUSADD - StreetSuffixDirectional
RVUSADD - RuralRouteValueBVUSADD -
BoxValue
ZCUSARE - ZipCode
NCUSARE - CityNYSIIS

Business File Unduplication Match


The following match is a template for unduplicating a file based on
business name and address information.

Derived Fields ZIP54 (first 4 chars ZCUSARE - )


E1STNM (first char of E1USNAM - )
Name1EnhancedFirstName
HASH4 (first 4 chars of L1USNAM - )
Name1PrimaryName
Blocking Fields HASH4 (first 4 chars of L1USNAM - )
Name1PrimaryName
ZIP54 (first 4 chars ZCUSARE - )
ATUSADD - AddressType
BVUSADD - BoxValue
NSUSADD - NYSIISofStreetName
RVUSADD - RuralRouteValue

12-18 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

Matching Fields L1USNAM - Name1PrimaryName


HNUSADD - HouseNumber
HSUSADD - HouseNumberSuffix
PDUSADD - StreetPrefixDirectional
SNUSADD - StreetName
SDUSADD - StreetSuffixDirectional
RVUSADD - RuralRouteValue
FVUSADD - FloorValue
UVUSADD - UnitValue
BNUSADD - BuildingName
ZCUSARE - ZipCode
NCUSARE - CityNYSIIS
BVUSADD - BoxValue

VarTypes HNUSADD - HouseNumber


HSUSADD - HouseNumberSuffix
PDUSADD - StreetPrefixDirectional
SDUSADD - StreetSuffixDirectional
RVUSADD - RuralRouteValue
BVUSADD - BoxValue
FVUSADD - FloorValue
UVUSADD - UnitValue
ZCUSARE - ZipCode
NCUSARE - CityNYSIIS

Individual Housholding Match


The following match is a template for identifying all the members of a
group. This process is referred to as householding. You can, for
example, use this template to identify all of the people living in the
same household, i.e. at the same address. This match blocks on
portions of ZIP code and name data and matches on address data

Derived Fields NONE

QualityStage Designer User Guide 12-19


12 DEFINING MATCH STAGES
Creating a Match Stage

Blocking Fields HNUSADD - HouseNumber


NSUSADD - NYSIISofStreetName
ZCUSARE - ZipCode
ATUSADD - AddressType
BTUSADD - BoxType
BVUSADD - BoxValue
RVUSADD - RuralRouteValue

Matching Fields HSUSADD - HouseNumberSuffix


PDUSADD - StreetPrefixDirectional
SNUSADD - StreetName
SDUSADD - StreetSuffixDirectional
RVUSADD - RuralRouteValue
BVUSADD - BoxValue
BNUSADD - BuildingName
L1USNAM - Name1PrimaryName
HNUSADD - HouseNumber

Var Types HSUSADD - HouseNumberSuffix


PDUSADD - StreetPrefixDirectional
SDUSADD - StreetSuffixDirectional
RVUSADD - RuralRouteValue
BVUSADD - BoxValue

Business Housholding Match


The following match is a template for identifying all the individual
members of a group. This process is referred to as householding. This
template differs from the individual householding match because the
conventions for company addresses are used for the householding
match This match blocks and matches against address data.

Derived Fields NONE

12-20 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

Blocking Fields ZCUSARE - ZipCode


HNUSADD - HouseNumber
NSUSADD - NYSIISofStreetName
ATUSADD - AddressType
BTUSADD - BoxType
BVUSADD - BoxValue
RVUSADD - RuralRouteValue

Matching Fields L1USNAM - Name1PrimaryName


PDUSADD - StreetPrefixDirectional
STUSADD - StreetSuffixType
SDUSADD - StreetSuffixDirectional
HSUSADD - HouseNumberSuffix
SNUSADD - StreetName
RVUSADD - RuralRouteValue
BVUSADD - BoxValue
FVUSADD - FloorValue
UVUSADD - UnitValue
BNUSADD - BuildingName

Var Types HSUSADD - HouseNumberSuffix


PDUSADD - StreetPrefixDirectional
SDUSADD - StreetSuffixDirectional
RVUSADD - RuralRouteValue
BVUSADD - BoxValue
FVUSADD - FloorValue
UVUSADD - UnitValue

QualityStage Designer User Guide 12-21


12 DEFINING MATCH STAGES
Creating a Match Stage

To Use a Template Match Specification:

1. In the Select Match Build Method dialog box, click Template .

The Create Matching Application dialog box appears, providing


you with instructions and specific questions that you need to
answer:

2. From the Response menu, select an answer to the question that


appears above it, and then click Next.
3. Repeat step 2, answering each question in turn.

12-22 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

The following table provides information for a sample response.

Response Description Requirements Default Passes Optional Passes


Individual Groups A standardized Last Name & Telephone
Customer records of the input file using Address Number
Match same the USADDR, ZIP Code & First Identification
individual USAREA, and Name
(not USNAME
House Number &
businesses) Domain
ZIP code
-Specific rule
sets.

4. Click Next after you answer each question.


After you answer all the questions, the following message appears:

5. Click OK.

QualityStage Designer User Guide 12-23


12 DEFINING MATCH STAGES
Creating a Match Stage

The Match Wizard – Match Specifications dialog box appears:

Note: QualityStage creates optional passes based upon your


answers to the questions in the Create Matching Application
dialog box.
For instance, the Individual Customer Match template asks you to
supply field definitions that contain the telephone number and the
social security number.
If you supplied these field definitions, QualityStage creates passes
for these fields. If you specified "none," no passes are created.
For more information on defining passes, specifying blocking
fields, and matching fields, see “Defining a Pass” on page 12-27,
“Specifying Blocking Fields” on page 12-27, and “Specifying
Matching Fields” on page 12-30.

12-24 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

6. You can view and modify the specifications as described in this


chapter, or you can click Finish to close the dialog box.

Creating a Custom Match Specification


If your match criteria cannot be configured using a template, create a
custom match specification.
To create a custom Match specification:
1. In the Select Match Build Method dialog box, click Custom.

QualityStage Designer User Guide 12-25


12 DEFINING MATCH STAGES
Creating a Match Stage

The Match Wizard – Match Specifications dialog box appears.

2. Click the match specification activity you need. For the defined
Match stage, you can:
• Add or modify passes
• Define variable types
• Define a report
• Define an extract
You can also delete passes and rearrange the order in which they
are executed. The first pass listed becomes Pass 1, the second Pass
2, the third Pass 3, and so on.

Tip: By expanding the passes, you can view the blocking variables
assigned to each pass.

12-26 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

The following sections provide more information on defining passes,


specifying blocking fields, and matching fields.

Defining a Pass
Defining a pass involves specifying which fields on your data files are
to be used for blocking and which for matching. All passes require that
you define the blocking fields. However, if you want exact matching,
you need to specify only your blocking fields.

Specifying Blocking Fields


When you click either Add or Modify Pass, the Blocking Variables
dialog box appears. With this dialog box, you specify a pair of fields,
one from the Data FileA and one from the Data FileB, unless you are
running an Undup Match stage, in which case you specify only the
blocking field on the Data FileA.
You must specify at least one blocking field, but as many as 20
blocking fields for a single pass. The length of the fields does not have
to be the same on both files.
To specify blocking fields for a match pass:

1. In the Match Wizard – Match Specifications dialog box, click Add


or Modify Pass.

QualityStage Designer User Guide 12-27


12 DEFINING MATCH STAGES
Creating a Match Stage

If you specified the Match, Match Sets, Geomatch, Geomatch


Multiple, or Geomatch Duplicates option, the following Match
Wizard – Blocking Variables dialog box appears:

12-28 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

If you specified either the Undup or the Undup Independent


option, the following Blocking Variables dialog box appears:

2. Next to Description, enter a string to identify the pass.


This string identifies the pass in the Match Wizard - Match
Specifications dialog box.
3. Under Available Data A Fields, select a field from FileA.
If you selected Undup or Undup Independent, proceed to step 5.
4. Under Available Data B Fields, select a field from FileB.
5. Select:
• Character Comparison if the fields are alphanumeric.
• Numeric Comparison if the fields are only numeric.

QualityStage Designer User Guide 12-29


12 DEFINING MATCH STAGES
Creating a Match Stage

6. Click Add to Block Specifications.


7. Repeat steps 3 through 6 for additional blocking fields.
8. After you specify all blocking fields, click Next.
The Match Wizard – Match Pass dialog box appears.

Specifying Matching Fields


With the Match Wizard – Match Pass dialog box, you specify:
• The field names for matching
• The type of comparison
• The m-probability and u-probability for each comparison
Optionally, you can specify a calculation to override the weights for a
comparison. In addition, you can set the cutoffs for the match, the
clerical review, and the duplicates for the entire match pass.
When you select a comparison, the Match Wizard – Match Pass dialog
box displays the fields, additional parameters, and modes required by
the comparison.
The Match stage supports reverse matching and matching on arrays
for specific comparison types. See “Using Reverse Matching” on page
12-35 and “Using Arrays” on page 12-36 for more information.

12-30 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

If you specified the Match, Match Sets, Geomatch, Geomatch


Multiple, or Geomatch Duplicates option, the following Match Wizard
– Match Pass dialog box appears:

QualityStage Designer User Guide 12-31


12 DEFINING MATCH STAGES
Creating a Match Stage

If you specified either the Undup or the Undup Independent option,


the following Match Wizard – Match Pass dialog box appears:

1. Under Compose Match Command, select a type of comparison.


The required fields are listed under Fields, and the required
parameters and modes are listed under Command Options.
See Appendix B, “Match Comparisons” for descriptions of each
comparison type.
2. Under Available Data Fields A, select the desired field, click ,
and repeat if required.
3. Under Available Data Fields B, select the desired field, click ,
and repeat if required.

12-32 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

Tip: Generally you do not want to use the blocking fields for this
pass for the matching fields. You can use blocking fields from
other passes for the matching fields on this pass.

4. Under Command Options, enter the m-probability.


• For most fields, use the default value of 0.9.
• For very important fields, use 0.999.
• For moderately important fields, use 0.95.
• For fields with poor reliability (such as street direction), use
0.8.
The higher you set the m-probability, the greater the penalty
when the field does not match.
5. Enter the u-probability.
The following are recommendations:
• For most data, use the default value of 0.01.
• For age, use 0.02.
• For sex, use 0.5.
See “Specifying the m-Probability and u-Probability” on page
12-35.
6. Enter any required and optional parameters.
7. Enter the mode if required.
8. (Optional) Click Override Weights to specify weights overrides.
See “Specifying Weight Overrides” on page 12-36 for more
information.
9. After defining the first comparison, click Add to Match Pass.
10. Repeat step 1 through step 9 for each additional comparison.
You can specify up to 40 comparisons.
11. (Optional) Under Match Pass Cutoffs, set the cutoff weights for
Match, Clerical, and Duplicate.
See “Assigning Match Pass Cutoffs” on page 12-34 for details.
12. Click OK.

QualityStage Designer User Guide 12-33


12 DEFINING MATCH STAGES
Creating a Match Stage

The following sections provide information on:


• Setting cutoffs
• Using the m-probability and the u-probability
• Reverse matching
• Matching arrays
• Each comparison type
• Specifying override weights

Assigning Match Pass Cutoffs


The lower right corner of the Match Wizard – Match Pass dialog box
allows you to specify the cutoff thresholds for a match, a clerical
review, and duplicates for the entire pass.
To specify cutoff thresholds:

Cutoff Description
Match When a record pair receives a composite weight greater than or
equal to this weight, the pair is declared a match.
Clerical When a record pair receives a composite weight greater than or
equal to this weight and less than the Match cutoff, the pair is
declared a clerical review. This weight is equal to or less than
the Match cutoff weight.
If you do not want a clerical review, set the Clerical cutoff equal
to the Match cutoff.
Duplicate The lowest weight that a record pair can have to be considered
a duplicate. This cutoff weight is optional and must be higher
than the Match cutoff weight. Note that this cutoff is not used
with Undup.

Initially you could try setting the cutoffs very low, such as zero (0).
Run a match for the pass and generate a report ordered by weight.
The high weight matches are the best. As the weight goes down, your
confidence in the match should decrease. Assign cutoffs at a weight
that is appropriate for your project.

12-34 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

With the Undup and Undup Independent options, duplicates with a


weight below the Match cutoff and above the Clerical cutoff are
automatically declared clerical review.

Specifying the m-Probability and u-Probability


Match requires an m-probability and a u-probability assigned to each
comparison. You need to provide an initial estimate for the
m-probability and u-probability.
The m-probability reflects the error rate for the field. For example, if
you determine that the first name in a record has a 10% error, you
would assign a 0.9 as the m-probability for that field. The closer the
m-probability is to one, the more critical a disagreement on the field
becomes. You can use a very high m-probability to force fields that are
very important to have a high penalty for disagreeing.
The u-probability is the probability of an accidental agreement of a
field. If you assign a high u-probability, a low weight is calculated
when a field agrees.
During the match run, the Match stage calculates a u-probability for
each comparison. As a result, the only concern you need for the
u-probability is that it must be less than the m-probability. If you need
to control the u-probability (such as fields for an individual
identification number, a Social Security Number, or a Patient
Identification Number), specify the Vartype to be NOFREQ (see
“Defining Vartypes” on page 12-40).

Using Reverse Matching


The Match stage allows reversing the matching process. Reverse
matching assigns the agreement weight when the fields disagree and
the disagreement weight when the fields agree. You can use reverse
matching with some comparisons. When you select a comparison that
supports reverse matching, the Reverse option is available in the
Match Wizard – Match Pass dialog box.
For the comparisons requiring parameters, the roles of the agreement
weight and disagreement weight are reversed. For example, the full
agreement weight is assigned if the fields are different to a degree

QualityStage Designer User Guide 12-35


12 DEFINING MATCH STAGES
Creating a Match Stage

greater than the parameter specified, and the full disagreement


weight is assigned if the fields are equal.

Using Arrays
The Match stage provides the ability to compare arrays of fields on
FileA to arrays of fields on FileB or arrays of fields on a single file for
an unduplication run. An array can comprise any number of fields,
including one. To use arrays, you need to define them (see “Defining
Arrays” on page 4-27).
Using arrays allows you to reduce the number of cross comparisons
you would have to define. For example, if you have a first name,
middle name, and last name field that might appear in any order (first
name in last name field), arrays compare all names without regard to
order.
Array matching is available with some comparisons. When you select
a comparison that supports array matching, the Array option is
available in the Match Wizard – Match Pass dialog box.
When calculating the weights for an array, the Match stage never
allows the weight for the array to exceed the weight that would result
if a single field were compared. This keeps the weights for array
comparisons from dominating weights of single fields.

Specifying Weight Overrides


Sometimes the normal method of calculating the weights are not
appropriate. The Weight Override dialog box allows you to change the
calculated weights for missing value situations or for specific
combinations of values. You can use this dialog box to control the
weights for each pass independently.
For a match using the Undup or Undup Independent option, the
values for FileA and FileB are for the two records being compared in
the same file. If you specify a value for AM, you should also specify one
for BM.

12-36 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

Examples for using weight overrides are:


• Using directions in street address for matching so that the address
123 Main St matches 123 Main St. over 123 N Main St. (Normally,
missing to a value receives the same weight as missing to missing.)
To do this, you might add a weight of 1.0 to Both Missing Value
Weight (XM) so that a missing value on both files receive a slightly
higher weight than a missing value on only one file. This results in
123 Main St to 123 Main St receiving a higher weight than 123 Main St
to 123 N Main St.
• Handling bogus values for telephone numbers for an unduplication
run. The telephone number of 1111111111 should not be matched.
To do this, you might replace the calculated Agreement Weight
(AW) with a –10 for a FileA Value of '1111111111'.
• Missing house numbers when trying to unduplicate a list of
customers.
To do this, you might add a weight of –10 to the FileA Missing
Value Weight (AM) and the FileB Missing Value Weight (BM). If
either the A or the B value is missing, subtract ten points from the
calculated weight. If both are missing, nothing is subtracted. The
probability of a house number being missing in both records
because of a coding error is much greater than the probability that
there really is no house number for that address.
To specify weight overrides for the entire comparison:

1. Click Override Weights.

QualityStage Designer User Guide 12-37


12 DEFINING MATCH STAGES
Creating a Match Stage

The Weight Overrides dialog box appears:

2. Select one of the following:


• Replace, to replace the weight calculated for this field with the
weight that you are specifying.
• Add, to add the weight that you are specifying to the weight
calculated for this field.

12-38 QualityStage Designer User Guide


DEFINING MATCH STAGES
Creating a Match Stage

3. Enter one, more than one, or all five of the weight overrides by:

Next to Enter
Agreement Weight (AW) An agreement weight if the values for the
field agree and are not missing.
Disagreement Weight (DW) A disagreement weight if the values for
the field disagree and are not missing.
FileA Missing Weight (AM) A weight when the value on FileA is
missing.
FileB Missing Weight (BM) A weight when the value on FileB is
missing.
Both Missing Weight (XM) A weight when values are missing on
both files.

You can specify negative values, which if you are adding to the
weights causes points to be subtracted from the calculated weight
for the field.
4. (Optional) Specify a value in the Conditional FileA Value or the
Conditional FileB Value, or both, as described here:

Next to Enter
Conditional FileA Value The value, enclosed in single quotes ('), in
(AV) a field on FileA or the word ALL.*
Conditional FileB Value The value, enclosed in single quotes ('), in
(BV) a field on FileB or the word ALL.*

If you specify a value, the weight statement is conditional on the


value, the value of the field being equal to the value specified. If
you specify the value ALL, or the argument is missing, the weight
statement applies to all non-missing values of the field.
5. Click Add Override.
6. Click OK and return to the Match Pass dialog box.

QualityStage Designer User Guide 12-39


12 DEFINING MATCH STAGES
Defining Vartypes

Defining Vartypes
You can assign a field or an array special treatment, such as
specifying that a disagreement on the field would cause the record
pair automatically to be considered a nonmatch. You can assign a field
more than one special treatment. This treatment applies to all passes
for this match job. To define Vartypes:

1. In The Match Wizard – Match Specifications dialog box, click


Vartype.
The Match Wizard – Vartype dialog box appears.

12-40 QualityStage Designer User Guide


DEFINING MATCH STAGES
Defining Vartypes

2. Next to Action, select one of the following treatments from the


drop-down list:

Action Description
CLERICAL A disagreement on the field causes the record pair
automatically to be considered a clerical review case
regardless of the weight.
CLERICAL A missing value should not cause the record pair being
MISSINGOK considered to be forced into clerical review, but a
disagreement would.
CRITICAL A disagreement on the field causes the record pair
automatically to be considered a non-match.
CRITICAL Missing values on one or both fields is acceptable; that
MISSINGOK is, the record pair is automatically rejected if there is a
non-missing disagreement.
NOUPDATE The m-probability is not changed after running mprob.
NOFREQ A frequency analysis is not run on the field. Used when
a field has unique values, such as Social Security
number.
CONCAT Concatenates up to four fields to form one frequency
count.

3. Select one of the following:


• Datafield
• Array
4. Under Available Fields, select a field, and then click .
5. If you selected CONCAT, repeat step 4 for up to four additional
fields.
6. Click Add Vartype.
The fields and the specified actions appear under Vartypes.
7. Click OK.

QualityStage Designer User Guide 12-41


12 DEFINING MATCH STAGES
Running Match Jobs

Running Match Jobs


Add the stage Before you can use your Match stage, you must add it to a job. You can
to a job either:
• Add your Match stage to an existing job, or
• Add a new job, and then add your Match stage to it.
For information about adding new jobs, see “Creating a New Job” on
page 6-4. For information about adding stages to jobs, see “Adding
Existing Stages to a Job” on page 6-6.
After adding your Match stage to a job, you can run it.

Setting up If this is the first job that you have run in the project, you need to set
the file structure up the file structure for the project. See Chapter 5, “Setting Up Run
Profiles”, for instructions.
Once your data files are available to QualityStage, run the Match job

How to run 1. On the left pane of the QualityStage main window, select a Jobs
the job folder.
2. From the jobs list on the right pane, select the job you want to run.
3. Do one of the following:

• On the Toolbar, click .


• Right-click on the job you want to run, and then click Run.

12-42 QualityStage Designer User Guide


DEFINING MATCH STAGES
Running Match Jobs

The Job Run Options dialog box appears. It looks like this:

4. Select a run profile from the Profile list.


If you have not defined a run profile, click Setup. For information
about how to set up run profiles, see Chapter 5, “Setting Up Run
Profiles”.
5. Under Select Run Options, select Deploy, Run, or both.
See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.
6. (Optional) If you want to run a QualityStage formatted report
after you run the job, do the following:
a. Select Prepare Report Data.
This specifies that prepared report data output will be put in
the Data directory for the project.
b. Select Retrieve Report Data, and specify the maximum file
size to retrieve.

QualityStage Designer User Guide 12-43


12 DEFINING MATCH STAGES
Running Match Jobs

The output file will be copied to the location specified in the


run profile for local report data.
c. (Optional) From the Extract list select the extract file you
want to use for report data.
See Chapter 15, “Working with QualityStage Reports”, for more
information about preparing data for formatted reports.
7. (Optional) Click Advanced Run Options to see other options you
can set, depending upon:
• The job you are running
• The profile you are using
Select the options you need. For information about the advanced
run options, see “Advanced Run Options” on page 8-3.

Match job options When you run a Match job, the following screen appears:

You can set the following parameters:


• Match Parameters. For information about setting Match
parameters, see “Setting Match Parameters” on page 12-48.
• Match Passes. For information about setting repeat passes,
see “Match reports and extracts” on page 12-47.

12-44 QualityStage Designer User Guide


DEFINING MATCH STAGES
Running Match Jobs

• Repeat Passes. For information about setting repeat passes,


see “Match reports and extracts” on page 12-47.
For information on building a Match Stage report or extract, see
“Working with Match Reports” on page 13-1.

Run mode 8. Click one of the following:


• Execute Data Stream Mode
• Execute File Mode
• Execute Parallel Extender Mode
See “About Run Modes” on page 7-2 for more information on the
three run modes.

Note: In Parallel Extender mode, if you intend to run a project built


with an earlier version of QualityStage (or INTEGRITY), you
must deploy the project using Parallel Extender mode before
you can run it.

Running in Data Stream Mode or Parallel Extender Mode


If you clicked either:
• Execute Data Stream Mode, or
• Execute Parallel Extender Mode
the Job Run Options dialog box closes.
If you selected Wait for Completion or are using a Windows local
server, status messages appear in the Status window.
When the job finishes running, a message like this one appears:

QualityStage Designer User Guide 12-45


12 DEFINING MATCH STAGES
Running Match Jobs

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

Running in File Mode


If you clicked Execute File Mode, the File Mode Execution screen
appears. It looks like this:

The File Mode Execution dialog box lists all the stages in the job. You
must run the same stages that you selected when deploying the job.
See “Deploying Jobs in File Mode” on page 7-7 for more information.

12-46 QualityStage Designer User Guide


DEFINING MATCH STAGES
Running Match Jobs

By default, all stages listed are run from first to last. However, you
can select a subset of stages to run. For information about how to
select a subset, see “To select a subset” on page 8-7.

1. Under Select Run Options:


a. Select Deploy, Run, or both.
When you select Run, QualityStage by default runs all passes.
See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.

Match reports b. (Optional) Select the Report check box if you want to generate
and extracts a Match report.
If you select Report, select the passes for which you want to
run a report. Undup Independent reports only the last pass
run.
c. (Optional) Select the Extract check box if you want to generate
a Match extract.
If you have previously deployed and run your Match stage, you
can select Report, Extract, or both. Otherwise you must deploy
and run the job to generate a report or extract.
Once you have run your Match stage, you can generate a
report or an extract any time without also deploying and
running the stage.
See “Working with Match Reports” on page 13-1 for more
information about Match reports and extracts.

Formatted reports d. (Optional) Select Prepare Report Data.


This specifies that prepared report data output will be put in
the Data directory for the project.
e. (Optional) Select Retrieve Report Data, and specify the
maximum file size to retrieve.
The output file will be copied to the location specified in the
run profile for local report data.
f. (Optional) From the Extract list select the extract file you
want to use for report data.

QualityStage Designer User Guide 12-47


12 DEFINING MATCH STAGES
Running Match Jobs

See Chapter 15, “Working with QualityStage Reports”, for more


information about preparing data for formatted reports.

To run the job 2. Click Run From Start to End.


The progress of the run is noted in the Status box.
When the run has successfully finished, a message like this one
appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

Setting Match Parameters


Depending on the amount of data that you are processing with a
Match stage, you might need to modify some of the default processing
parameters. These parameters include:
• Buffer sizes for specific Match stages.
• Size of the frequency file.
• Location of the output directory for the match sort.

Buffer sizes The Match stage uses, by default, 2000 buffers of 1024 bytes (about
2 megabytes) each of memory or disk space, depending on the server,
for most of its processing. You might want to change the number of
buffers used by the Match stage if you have a server with little
memory or very large data files (if the server has extensive memory).
You might want to review the statistic summary section of the
statistics report to determine if you need to alter the number of buffers
allocated. See the “Summary Statistics Section” on page 13-30.

12-48 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

Frequency file size By default, QualityStage includes up to 100 entries in a frequency file,
which means that for any field requiring frequency analysis, the 100
most frequent occurrences are included in the frequency file. You can
use the Maximum Frequency Entry field to increase the maximum
number of entries. You may want to do this if you are processing large
numbers of records.

Presort output The Match stage sorts the data file before each match pass. If you
location have limited space where your data files are located, you can specify
another location for the sorted files.

Match debug file You can request generating the statistics report by specifying the base
file name in the Match Debug File field. QualityStage appends an
underscore ( _ ), the pass number, and a.OUT to this file name, which
is created in the Script directory. You can specify a full path if you want
the report in a different location.

Note: On an OS/390 server, the statistics are always generated to


sysprint.

Using an XML Match Spec to Create a Match


An XML Match Spec is an XML file that can be imported into
QualityStage and used to create a Match stage.
Unlike templates, XML Match Specs are empty of specific match
information. XML Match Specs already contain all the information in
a match.
QualityStage includes a library of XML Match Specs that mirror the
five available default matching templates. The types of matches
available in the library include:
• Individual householding match
• Individual unduplication match
• Individual customer match
• Business unduplication match
• Business householding match

QualityStage Designer User Guide 12-49


12 DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

How Can I Use an XML Match Spec?


You can add XML Match Specs that have been edited for specific
matches to the QualityStageDesigner70/ObjectRepository/MatchSpec/Working
directory in addition to the template XML Match Specs that are
included in your QualityStage installation. You can then add any
XML Match specs in the ObjectRepository directory to any project. This
allows you to share Match stages between projects without having to
import an entire project.
For more information on creating your own Match Specs see “Creating
Your Own XML Match Spec Files” on page 12-51.

Important: QualityStage only imports XML Match Specs, it does not produce
them.

Selecting an XML Match Spec

1. In the Select Match Build Method dialog box, click Match Spec
Library.

12-50 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

2. In the Match Spec Library dialog box, select the Match spec you
want to use to create your match.

3. Click Next to begin importing the match definition.


The Match Wizard – Match Specifications screen appears. You can
now edit the imported match definition.
4. When you finish defining your match, do one of the following:
• Click Finish to save the Match stage and exit the dialog box.
• Click Cancel to exit the dialog box without saving the Match
stage.

Creating Your Own XML Match Spec Files


The structure of the XML Match Spec file is defined by the Match Spec
DTD, which is described later in this chapter.
The Match Spec Library is a collection of XML Match Specs that
reside in the ObjectRepository directory in your QualityStage Designer
installation.

QualityStage Designer User Guide 12-51


12 DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

Limitations
All characters that are used by the XML language must be escaped,
for example <> characters.

XML Overview
The table below describes the most important elements in the
QualityStage XML Match Spec.

Element Description
MATCHSPEC This is the main element in the DTD, it defines all of
the parameters in the match.
MATCHPASS This element defines the parameters for a particular
match pass. If you have multiple passes, you may have
multiple entries for this element.
BLOCKSPEC This element describes the blocking for a particular
match pass.
MATCHCOMMAND This element describes all of the parameters of the
match itself.
VARTYPE This element describes the parameters for the vartype
for the match.

Editing XML Match Spec Templates


The easiest way to create new XML Match Specs is to edit the existing
templates.

1. Open the directory where you installed QualityStage Designer.


The default directory is C:\Ascential\QualityStageDesigner<version>, for
example C:\Ascential\QualityStageDesigner70. The ObjectRepository
directory is located here.
2. Open one of the existing XML Match Spec templates in an ASCII
editor.
3. Save the XML Match Spec template with a different name.

12-52 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

4. Edit the necessary elements so that the XML Match Spec reflects
the match you want to create.

Note: If the MATCHPASS description is longer than 40 characters or


QualityStage truncates it.

5. Save the file with your changes in the ObjectRepository directory.


You can now select the new XML Match Spec in QualityStage
Designer.

XML DTD
<!ELEMENT MATCHSPEC (MATCHDESCRIPTION, MATCHPASS+,
VARTYPE*) >
<!ATTLIST MATCHSPEC UNDUP (Y | N) #REQUIRED>
<!ATTLIST MATCHSPEC TWOFILE (Y | N) #REQUIRED>
<!ELEMENT MATCHDESCRIPTION (#PCDATA) >
<!ELEMENT MATCHPASS (DESCRIPTION, BLOCKSPEC+,
MATCHCOMMAND+, MATCHCUTOFF, CLERICALCUTOFF,
DUPLICATECUTOFF?) >
<!ELEMENT DESCRIPTION (#PCDATA) >
<!ELEMENT BLOCKSPEC (FIELD1, FIELD2?) >
<!ATTLIST BLOCKSPEC BSCOMPARE (CHARACTER | NUMERIC)
#REQUIRED>
<!ELEMENT MATCHCOMMAND (MCCOMPARISON, (ARRAYFIELD1 |
FIELD1)+, (ARRAYFIELD2 | FIELD2)*, MPROB, UPROB, PARAM1?, PARAM2?,
WEIGHTOVERRIDE*) >
<!ELEMENT MATCHCUTOFF (#PCDATA) >
<!ELEMENT CLERICALCUTOFF (#PCDATA) >
<!ELEMENT DUPLICATECUTOFF (#PCDATA) >
<!ELEMENT FIELD1 (#PCDATA) >
<!ELEMENT FIELD2 (#PCDATA) >
<!ELEMENT MCCOMPARISON (#PCDATA) >
<!ELEMENT MPROB (#PCDATA) >
<!ELEMENT UPROB (#PCDATA) >
<!ELEMENT PARAM1 (#PCDATA) >

QualityStage Designer User Guide 12-53


12 DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

<!ELEMENT PARAM2 (#PCDATA) >


<!ELEMENT WEIGHTOVERRIDE (AW?, DW?, AV?, BV?, AM?, BM?, XM?) >
<!ATTLIST WEIGHTOVERRIDE WOTYPE (ADD | REPLACE | MULTIPLY)
#REQUIRED>
<!ELEMENT AW (#PCDATA) >
<!ELEMENT DW (#PCDATA) >
<!ELEMENT AV (#PCDATA) >
<!ELEMENT BV (#PCDATA) >
<!ELEMENT AM (#PCDATA) >
<!ELEMENT BM (#PCDATA) >
<!ELEMENT XM (#PCDATA) >
<!ELEMENT VARTYPE (FIELD1) >
<!ATTLIST VARTYPE ACTION (CRITICAL | CRITICALMISSINGOK | CLERICAL
| CLERICALMISSINGOK |
VARTYPE_NOFREQ | NOUPDATE | CONCAT) #REQUIRED>

XML Vocabulary Glossary


The following glossary is listed in the order in which elements would
be included in the XML file.

Element Description
<!ELEMENT MATCHSPEC ( This is the match spec object. This defines all of
MATCHDESCRIPTION, MATCHPASS+, the parameters of the match.
VARTYPE*) >
<!ATTLIST MATCHSPEC UNDUP (Y | N) Defines whether the match is an unduplication.
#REQUIRED> This attribute cannot have a N value if the
TWOFILE attribute also has an N value. All other
combinations of the UNDUP and TWOFILE
attributes are valid.
<!ATTLIST MATCHSPEC TWOFILE (Y | N) Defines whether the match is a single file match or
#REQUIRED> a two-file match.
This attribute cannot have a N value if the
UNDUP attribute also has an N value. All other
combinations of the UNDUP and TWOFILE
attributes are valid.

12-54 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

Element Description
<!ELEMENT MATCHDESCRIPTION (#PCDATA) > Defines the match description. The description can
have any desired length. However, we recommend
less than fifty characters, because you will use this
description to select the match in QualityStage.
You cannot include characters in this description
that are used in XML, such as <,>, and &.
<!ELEMENT MATCHPASS (DESCRIPTION, Describes the parameters of the match pass.
BLOCKSPEC+, MATCHCOMMAND+,
MATCHCUTOFF, CLERICALCUTOFF,
DUPLICATECUTOFF?) >
<!ELEMENT DESCRIPTION (#PCDATA) > Defines the match pass description. This
description will be automatically truncated by
QualityStage to under forty characters. You
cannot include characters in this description that
are used in XML, such as <,>, and &.
<!ELEMENT BLOCKSPEC (FIELD1, FIELD2?) > Describes the blocking fields for the match.
<!ATTLIST BLOCKSPEC BSCOMPARE Defines the blocking comparison type.
(CHARACTER | NUMERIC) #REQUIRED>
<!ELEMENT MATCHCOMMAND Defines the match command parameters. This
(MCCOMPARISON, (FIELD1 | ARRAYFIELD1)+, defines the comparison type, the fields or arrays
(FIELD2 | ARRAYFIELD2)*, MPROB, UPROB, compared, and other matching parameters.
PARAM1?, PARAM2?, REVERSE?,
WEIGHTOVERRIDE*) >
<!ATTLIST MATCHCOMMAND MCMODE (NONE Defines the mode for comparison types that
| ZEROVALID | ZERONULL | EITHER | require a mode to be defined.
BASEDPREV) #REQUIRED >
<!ELEMENT MATCHCUTOFF (#PCDATA) > Defines the cutoff number for the match.
<!ELEMENT CLERICALCUTOFF (#PCDATA) > Defines the clerical cutoff number for the match.
<!ELEMENT DUPLICATECUTOFF (#PCDATA) > Defines the duplicate cutoff number for the match.
<!ELEMENT FIELD1 (#PCDATA) > Defines a field from the first match file.
<!ELEMENT FIELD2 (#PCDATA) > Defines a field from the second match file. This
element is not required for single file matches.
<!ELEMENT ARRAYFIELD1 (#PCDATA) > Defines an array from the first match file.

QualityStage Designer User Guide 12-55


12 DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

Element Description
<!ELEMENT ARRAYFIELD2 (#PCDATA) > Defines an array from the second match file. This
element is not required for single file matches.
<!ELEMENT MCCOMPARISON (#PCDATA) > Defines the comparison type of the match
command.
<!ELEMENT MPROB (#PCDATA) > Defines the M-probability for the match.
This must be an integer value. This should be the
value you would enter in QualityStage * 1000. For
example, if you want to enter a M-probability of .9,
you would include a value of 900.
<!ELEMENT UPROB (#PCDATA) > Defines the U-probability for the match.
This must be an integer value. This should be the
value you would enter in QualityStage * 1000. For
example, if you want to enter a U-probability of
.01, you would include a value of 10.
<!ELEMENT PARAM1 (#PCDATA) > Defines parameter one for the match. This must be
an integer value. This element is optional.
<!ELEMENT PARAM2 (#PCDATA) > Defines parameter two for the match. This must
be an integer value. This element is optional.
<!ELEMENT REVERSE (#PCDATA) > Designates a match as a reverse match. This
element is optional.
<!ELEMENT WEIGHTOVERRIDE (AW?, DW?, Defines the parameters for weight overrides.
AV?, BV?, AM?, BM?, XM?) >
<!ATTLIST WEIGHTOVERRIDE WOTYPE (ADD | Defines the weight override type.
REPLACE | MULTIPLY) #REQUIRED>
<!ELEMENT AW (#PCDATA) > Defines the agreement weight parameter. This
element is optional.
<!ELEMENT DW (#PCDATA) > Defines the disagreement weight parameter. This
element is optional.
<!ELEMENT AV (#PCDATA) > Defines the conditional FileA value parameter.
This element is optional.
<!ELEMENT BV (#PCDATA) > Defines the conditional FileB value parameter.
This element is optional.

12-56 QualityStage Designer User Guide


DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

Element Description
<!ELEMENT AM (#PCDATA) > Defines the conditional FileA missing weight
parameter. This element is optional.
<!ELEMENT BM (#PCDATA) > Defines the conditional FileB missing weight
parameter. This element is optional.
<!ELEMENT XM (#PCDATA) > Defines both files missing weight parameter. This
element is optional.
<!ELEMENT VARTYPE ((FIELD1 | Defines the VARTYPE for the match.
ARRAYFIELD1)) >
<!ATTLIST VARTYPE ACTION (CRITICAL | Defines the type of action for the VARTYPE.
CRITICALMISSINGOK | CLERICAL |
CLERICALMISSINGOK | NOFREQ | NOUPDATE |
CONCAT) #REQUIRED>

QualityStage Designer User Guide 12-57


12 DEFINING MATCH STAGES
Using an XML Match Spec to Create a Match

12-58 QualityStage Designer User Guide


13

Working with Match Reports

Creating re-engineered data is a four-phase process, as discussed in


Chapter 2 “The Workflow for Creating Re-engineered Data”. The
fourth phase is where you evaluate the results and determine your
next step.
This chapter discusses the reports generated by Match stages and how
you can use them to evaluate your results.

Figure 13-1 Phase Four: Evaluate The Results

The Match stage can perform multiple matching passes. You may find
it useful to evaluate the results of one pass before performing the next
pass.
For example, on the first pass, your plan may be to first perform a
pass that matches on social security number and then perform a pass
that matches on date of birth. However, the results of the first pass
may be sufficient and you don’t need to perform the second pass, or the
results may indicate that you need to choose a different field for the
second pass.

QualityStage Designer User Guide 13-1


13 WORKING WITH MATCH REPORTS
About Match Reports

The contents of the reports can help you choose the appropriate action.

About Match Reports


The Match stage generates reports on the results of any or all
completed passes when you specify Report in the File Mode Execution
dialog box. When you run a report, the Match stage writes the
requested records to a file, which you can view and print using a text
editor, or preview using QualityStage’s Report Viewer.
You request reports through the File Mode Execution dialog box. You
can generate a report any time for all or selected passes after you
staged and run your Match job.
Match reports require a report specification that defines the report
content and layout. QualityStage provides a default report
specification. Optionally, you can customize the reports for a specific
stage.
This section describes the default report specifications and provides
instructions on customizing reports.

Note: You can generate a default or custom Match report by running a


job in file mode only. Data stream mode does not generate Match
reports.

Using the Default Report


When you run a job, Match creates default report specifications as a
file using the name of the job and appending a .DEF. For example, if
you have a job named TEST, the report specification is TEST.DEF. On
an OS/390 server, this report specification is created as a data set; on
a UNIX or Windows server, the specification is located in the Controls
directory for your project.

For Match and Geomatch runs, the default report specification


includes the following result data:
• Match records.

13-2 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Report

• Clerical review records.


• Duplicate records on FileA.
• Duplicate records on FileB.
For Undup runs, the default report specification includes the following
result data:
• Match records.
• Clerical review records.
• Duplicate records on FileA.
For Undup Independent runs, the default report specification includes
group all.
These default reports display the fields in the order that you specified
for matching.
The default report files are created with the name of the job with a
.RPT appended. On an OS/390 server, the files are created as a data
set. On a UNIX or Windows server, the files are created in the Controls
directory for your project.

Customizing a Match Report


When you customize a report, you have the option of displaying up to
six different types of record, including the residual records and
grouping the duplicate records for Undup runs. To customize a report,
you need to define the files to be used for the reports.
When you define these files, you need to specify only a name and a
description for each file, using the Add a New Datafile dialog box. You
can use one or more output files for your report types. The Match
stage stores your custom reports files in the same location as any
other output file; for example, on a UNIX server in the Data directory
of your project files.

To customize a Match report, you:


• Define the report
• Specify the report layout

QualityStage Designer User Guide 13-3


13 WORKING WITH MATCH REPORTS
Customizing a Match Report

Defining a Custom Report


Before you customize a report, make sure that you have defined all of
the output files to which you want to generate the reports.
To define a custom report:

1. In the Match Wizard – Match Specifications dialog box, click


Report.
If you specified the Match, Match Sets, Geomatch, Geomatch
Multiple, or Geomatch Duplicates option, the following Match
Report Stage wizard appears.

13-4 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Report

If you specified either the Undup or Undup Independent option,


the following Match Report Stage Wizard appears:

2. Under Select Outputs, select the output file for the report type.

QualityStage Designer User Guide 13-5


13 WORKING WITH MATCH REPORTS
Customizing a Match Report

3. Click one of the following report types:

Report type Prints


MATCH The matched records for both FileA and FileB for a
Match or Geomatch run and the master records for an
Undup run.
CLERICAL The clerical review records for both FileA and FileB for
Match or Geomatch runs and the duplicates that fall in
the clerical range for Undup runs.
DUPA The duplicates on FileA for a Match or Undup run. (A
Geomatch run has no duplicates on FileA.) If you use
the MATCH file, the duplicates appear just below their
associated matches.
DUPB The duplicates on FileB for a Match, Geomatch
Duplicate or Multiple run. If you use the MATCH file,
the duplicates appear just below their associated
matches.
RESA The residual records (nonmatches) on FileA. For Undup
runs, residuals are the master records and the
unassociated records. This results in printing one line
for each unique record in the file.
RESB The residual records (nonmatches) on FileB. Only valid
for Match runs. (For a Geomatch run, all FileB records
are residuals.)
GROUP The master records and their associated duplicates from
all passes grouped together. This report type is available
only for Undup runs.
This type ensures that the master records and their
associated duplicates appear only once, rather than for
each pass.
GROUPALL The master records and their associated duplicates from
all passes grouped together. The records are arranged
into sets with the unassociated records (residuals)
printed at the end. This report type is available only for
Undup runs.

13-6 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Report

Note: If you selected Undup Independent, only ResA, Group, and


GroupAll are available.

4. To use a different output file for another report, select the file from
the Select Outputs list, and then click the report type.
5. After defining all reports, click Next.
The Match Report Specification dialog box appears.

Specifying the Report Layout


You define the record layout for each report with the Match Report
Specification dialog box. You use statements to specify the print
layout for each line type. These statements include one or more
arguments, which must be separated by a space. Statements must be
in uppercase.
Reports consist of three types of line: headers, an A line, and a B line.
The headers label the columns. All MOVE Literal Statements are
automatically entered into the HEADER lines. The A line shows the
records and fields on FileA. The B line is for FileB or duplicates for an

QualityStage Designer User Guide 13-7


13 WORKING WITH MATCH REPORTS
Customizing a Match Report

Undup run. You can specify any field that has been defined for the
files, including those not used for blocking or matching.

At the bottom of the dialog box, you can change:


• The number of Lines per Page.
• The Order by Weight (descending) in which high weight matches
appear first.
For each report type you specified with the Report Wizard, a tab is
provided. You must specify the record layout in the first tab, which is
usually the Match tab. All subsequent reports types use first tab
report layout unless you specify a different layout for the report type.
You can either type the statements in directly or use the lists and
Insert buttons at the top of the dialog box. If you do use the drop-down
lists and buttons, you still need to enter more information. Once you
have finished creating your statements, you can copy, paste, or cut

13-8 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Report

statements between reports using the CTRL+C, CTRL+V, and


CTRL+X keyboard shortcuts. You can also manually drag and drop
statements into the desired position within a report. You can specify
the following four statements:
• LOW
• HIGH
• MOVE
• MOVELR
The remainder of this section describes the statements and their
arguments.
The LOW and HIGH statements should be the first in the report
specification. These statements limit the report to records within the
specified weight range. These statements have the following format:

Statement To
LOW weight Specify the lowest weight to appear on report.
HIGH weight Specify the highest weight to appear on report.

With the following example, the report would show only those records
with weights within the range of 3.0 to 9.0.
LOW 3.0
HIGH 9.0
If you specified either only LOW or only HIGH, the range is
open-ended. If you specify neither, all records are reported. Weights do
not apply to residual records.

QualityStage Designer User Guide 13-9


13 WORKING WITH MATCH REPORTS
Customizing a Match Report

The MOVE statement sets up the record format by moving field


values, special variables, and literals to the output record. The MOVE
statement uses one of the following three formats:
MOVE "literal" TO HEADER column

Argument Description
"literal" Any character string, which can be mixed case. Must be
enclosed in quotation marks (" ").
TO The header that is printed at the top of each page. Only
HEADER literals should be moved to the header line. All MOVE
Literal Statements, when entered through the Add button,
are automatically placed in the header.
column The output position of the literal value in the header line.
The first column is column 1, and report lines can be 150
characters long. For example, if a literal value “New Report”
had a column value of 10, the literal value would start at
column 10 in the header.

MOVE fieldname TO LINEA | TO LINEB column [length]

Argument Description
fieldname You can move an array or field. To do so click the Datafield A
or Datafield B buttons and select the field from the display
dialog box. To insert array fields click the Arrayfield A or
Arrayfield B buttons and select the array from the display
dialog box.
You can select any field from FileA and FileB, even if you did
not use that field for blocking or matching.
TO LINEA The line for fields from FileA. This line is not displayed for
DUPA and RESA report types.

13-10 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Report

Argument Description
TO LINEB The line for fields from FileB. This line is not displayed for
DUPB and RESB report types.
column The output position of the field on line A or line B. The first
column is column 1, and report lines can be 150 characters
long.
length The length of the field to be displayed. The default is the
defined length of the field.

MOVE @variable TO LINEA | TO LINEB column

Argument Description
@variable One of the following variables, selected from the Special
Variables drop-down list:
@SET8 The group identifier. This is a number assigned
@SET9 to each group by Match. Every record in a group
@SET10 will receive the same number.
See Note on page 13-12.
@WGT The match weight assigned during the match
process. Available only for: MATCH,
CLERICAL, DUPA, DUPB.
@TYPE The type of the record:
[MA]Match on FileA
[MB]Match on FileB
[CA]Clerical review on FileA
[CB]Clerical review on FileB
[DA]Duplicate on FileA
[DB]Duplicate on FileB
[RA]Residual on FileA
[RB]Residual on FileB
[XA]Master record in a set of duplicate
records (only Undup)
@RECA8 The record number from FileA.
@RECA9 See Note below.
@RECA10

QualityStage Designer User Guide 13-11


13 WORKING WITH MATCH REPORTS
Customizing a Match Report

Argument Description
@RECB8 The record number from FileB.
@RECB9 See Note below.
@RECB10
@EXACT The exact match flag for fields that had values
and matched exactly.
@LR The left/right match flag; only for Geomatch
comparisons of double intervals that set the
left/right flag.
TO LINEA The location on line A for the variable. This line is not
displayed for DUPB and RESB report types.
TO LINEB The location on line B for the variable. This line is not
displayed for DUPB and RESB report types.
column The output position of the variable on line A or line B. The
first column is column 1, and reports can be 150 characters
long.

Note: The number appended to SET, RECA, and RECB indicates the
number of bytes allocated. For example, SET10 uses 10 bytes. It
is strongly recommended that when processing more than 100
million records, you use either 9- or 10-byte variables.

Tip: Generally, you should format LINEA and LINEB statements one
after the other so that you can see how each field compared.

The following example statements specify a report with the type of


record appearing in column 1, first name starting in column 10, and
the last name starting in column 25 with the record from FileA
appearing first followed by the record from FileB:
MOVE "TYPE" TO HEADER 1
MOVE @TYPE TO LINEA 1
MOVE @TYPE TO LINEB 1
MOVE "FIRST NAME" TO HEADER 10
MOVE FIRST_NAME TO LINEA 10 14

13-12 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
About Match Extracts

MOVE FIRST TO LINEB 10 14


MOVE "LAST NAME" TO HEADER 25
MOVE LAST_NAME TO LINEA 25
MOVE LAST TO LINEB 25
Notice that the fields for first name specify a length of 14. This
overrides the length defined for these fields.
The MOVELR statement moves a field depending on whether the left
or right flag is set from any interval comparison for parity match. You
use this statement for Geomatch runs against a Census Bureau Tiger
reference file or similar file.
The MOVELR statement uses the following format:
MOVELR left-field right-field TO LINEB column [length]

Argument Description
left-field The left field (such as ZIP or city) from FileB; only
appears if matched to the left interval.
right-field The right field from FileB; only appears if matched to the
right interval.
TO LINEB The line B for the field.
column The output position of the field. The first column is
column 1, and report lines can be 150 characters long.
length The length of the field to be displayed. The default is the
defined length of the field.

The following example moves the left Census Tract ID to the output
line if the type L field matched to the left interval, otherwise the right
Census Tract ID is moved to the same location:
MOVELR LEFT_TRACT RIGHT_TRACT TO LINEB 25 6

About Match Extracts


As a result of running the Match job, QualityStage creates an extract.
An extract contains records in the original file plus information on

QualityStage Designer User Guide 13-13


13 WORKING WITH MATCH REPORTS
About Match Extracts

which records are duplicates. You use the extract as input for the
Survive stage, as explained in Chapter 14 “Defining Survive Stages”.
When an extract is generated, Match writes the requested records to a
file, which you can use for subsequent stages of your data
re-engineering project.
You can generate an extract any time after you have staged and run
your Match job.
Match requires an extract specification that defines the content of the
extract file. QualityStage provides a default specification. Optionally,
you can create a customized specification for a job.
Match generates an extract file of any or all passes of a match run
when you either:
• Specify Extract in the File Mode Execution dialog box (when
running in file mode)
• Define a custom extract and run the job in data stream mode.
This section describes the default extract specifications and provides
instructions on customizing extract files.

Note: You can generate default extracts in file mode only. You can
generate custom extracts in file mode or data stream mode.

Using the Default Extract


When you run a job, Match creates default extract specifications as a
file using the name of the job and appending a .DEX. For example, if
you have a job named TEST, the extract specification is TEST.DEX. On
an OS/390 server, this report specification is created as a data set; on
a UNIX or Windows server, the specification is located in the Controls
directory for your project.

For Match and Geomatch runs, the default extract specification


includes the following result data:
• Match records.
• Clerical review records.

13-14 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Extract

• Duplicate records on FileA.


• Duplicate records on FileB.

For Undup runs, the default extract specification includes the


following result data:
• Match records.
• Clerical review records.
• Duplicate records on FileA.

For Undup Independent runs, the default extract specification


includes the following result data:
• Residual records.
• GROUPALL records.
The generated record layout is written to a file named
<projectname>.RLD. The default extract files are created with the
name of the job. For Match and Clerical extracts, the files are
appended with a .OUT. The files are appended with a .RAS for the
FileA residual records and with a .RAB for the FileB residual records.
On an OS/390 server, the files are created as a data set. On a UNIX or
Windows server, the files are created in the Controls directory for your
project.

Customizing a Match Extract


When you customize an extract, you have the option of writing up to
six output files for six different record types, including the residual
records and grouping the duplicate records for Undup runs. To
customize an extract, you need to define the files to be used for the
extracts using the Data File Wizard.
When you define these files, you only need to specify a name and
description for each file using the Add a New Datafile dialog box. You
can use one or more output files for your extract types. Match stores
your customized files in the same location as any other output file; for
example, on a UNIX server in the Data directory of your project
directory.

QualityStage Designer User Guide 13-15


13 WORKING WITH MATCH REPORTS
Customizing a Match Extract

To customize a Match extract, you:


• Define the extract using the Match Extract Stage Wizard.
• Specify the extract using the Match Extract Specification dialog
box.

Defining a Custom Extract


Before you customize an extract, make sure that you have defined all
of the output files to which you want to extract the data.
To define an Extract for a match:

1. On the Match Wizard – Match Specifications dialog box, click


Extract.

13-16 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Extract

If you specified the Match, Match Sets, Geomatch, Geomatch


Multiple, or Geomatch Duplicates option, the following Match
Extract Stage Wizard appears:

QualityStage Designer User Guide 13-17


13 WORKING WITH MATCH REPORTS
Customizing a Match Extract

If you specified either the Undup or Undup Independent option,


the following Match Extract Stage Wizard appears:

2. Under Select Outputs, select the output file for the extract type.

13-18 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Extract

3. Click one of the following extract types:

Extract type Prints


MATCH The matched records for both FileA and FileB for a
Match or Geomatch run and the master records for an
Undup run.
CLERICAL The clerical review records for both FileA and FileB for
Match or Geomatch runs and the duplicates that fall
in the clerical range for Undup runs.
DUPA The duplicates on FileA for a Match or Undup run. (A
Geomatch run has no duplicates on FileA.) If you use
the MATCH file, the duplicates appear just below
their associated matches.
DUPB The duplicates on FileB for a Match, or Geomatch
Duplicate or Multiple run. If you use the MATCH file,
the duplicates appear just below their associated
matches.
RESA The residual records (non-matches) on FileA. For
Undup runs, residuals are the master records and the
unassociated records. This results in writing one line
for each unique record in the file.
RESB The residual records (non-matches) on FileB. Valid
only for Match runs. (For a Geomatch run, all FileB
records are residuals.)
GROUP The master records and their associated duplicates
from all passes grouped together. This extract type is
available only for Undup runs.
This type ensures that the master records and their
associated duplicates appear only once, rather than for
each pass.
GROUPALL The master records and their associated duplicates
from all passes grouped together. The records are
arranged into sets with the unassociated records
(residuals) at the end. This extract type is available
only for Undup runs.

QualityStage Designer User Guide 13-19


13 WORKING WITH MATCH REPORTS
Customizing a Match Extract

Note: If you selected Undup Independent, only ResA, Group, and


GroupAll are available.

4. To use a different output file for another extract, select the file
from the Select Outputs list, and then click the extract type.
5. After defining all extracts, click Next.
The Match Extract Specification dialog box appears.

Specifying the Extract Layout


The Match Extract Specification screen lets you create statements for
generating formatted extracts. You use statements to specify the file
layout for one or more extract types.
For each extract type you specified in the previous Match Extract
Stage Wizard, a tab is provided here. You must specify the record
layout in the first tab, generally Match. All subsequent extract types
use that same extract layout. You need to specify record layouts for
subsequent extract types only if you want a different layout for the
extract type

13-20 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Extract

To specify different extract types:


• Click Back to return to the Match Extract Stage Wizard. Any
additions or modifications you made are retained.

File layout statements include one or more arguments, which must be


separated by a space. Statements must be in uppercase. You can
either type a statement directly in the Statement text box or use the
lists to help you generate a statement. The statements you create are
listed in the lower area of the screen. You can copy, paste, or cut
statements between extracts using the CTRL+C, CTRL+V, and
CTRL+X keyboard shortcuts. You can also manually drag and drop
statements into the desired position within an extract. See “Creating
Statements” on page 13-22.
To cancel defining or modifying extract layouts:

QualityStage Designer User Guide 13-21


13 WORKING WITH MATCH REPORTS
Customizing a Match Extract

• Click Cancel to return to the Match Extract Stage Wizard. Any


additions or modifications you made are discarded.
To save the layouts:
• Click OK to return to the Match Wizard − Match Specifications
dialog box.

Creating Statements
You can easily generate a statement by selecting various options
available under the Enter the Data area at the top of the dialog box.
Depending on the statement type you select from the Statements list,
additional drop-down lists and/or text boxes display for each argument
you must specify for the selected statement.
To generate a statement:

1. From the Statements drop-down list, select the statement you


want to create.
For explanations about the available statements and their
arguments, see “Extract Statements and Arguments” on page
13-23.
2. For each argument that appears, select or enter the desired
setting.
3. Click Add to generate the statement and add it to the statement
list at the bottom of the screen.
Or, if you are familiar with the syntax of Match and Extract
statements used in report layouts, you can directly enter each
statement in the Statement text box.
To manually create a statement:

1. In the Statement text box, enter the statement and its arguments.
2. Click Insert to add it to the statement list at the bottom of the
screen.

13-22 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Extract

Maintaining Statements
To remove a statement from the statement list at the bottom of the
screen:

1. Select the statement you want to delete.


2. Click Delete.
To modify a statement in the list:

1. Select the statement you want to modify.


2. Click Edit.
The statement is replicated under Enter the Data.
3. Edit the statement as needed.
4. To save your changes to the list, click Update.
5. To abort modifying the statement, click Clear.

Extract Statements and Arguments


MOVE-Literal
The MOVE-Literal statement sets up the record format by moving the
literal to the output record as a Header that is printed at the top of
each page. The MOVE-Literal statement uses the following format:
MOVE "literal" TO HEADER column

Argument Description
"literal" Any character string, which can be mixed case. Must be
enclosed in quotation marks (" ").
column The output Column position of the header on the report. The
first column is column 1, and report lines can be 150
characters long.

Note: If you want to insert a space between columns, use the


MOVE-Literal statement and use a character space (or as many
spaces as you wish to insert between columns) as the argument
for the literal— MOVE “ ”.

QualityStage Designer User Guide 13-23


13 WORKING WITH MATCH REPORTS
Customizing a Match Extract

MOVE-Variable
The MOVE-Variable statement sets up the record format by moving
special variables to the output record. The MOVE-Variable statement
uses the following format:
MOVE @variable TO LINEA | TO LINEB column

Argument Description
@variable One of the following variables, selected from the Special Variables
drop-down list:

@WGT The match weight assigned during the match process.


Available only for: MATCH, CLERICAL, DUPA, DUPB.
@TYPE The type of the record:
[MP]Match pair from FileA and FileB
[CP]Clerical review pair from FileA and FileB
[DA]Duplicate on FileA
[DB]Duplicate on FileB
[RA]Residual on FileA
[RB]Residual on FileB
[XA]Master record in a set of duplicate records (only
Undup)
@RECA8 The record number from FileA.
@RECA9 See Note on page 13-25.
@RECA10
@RECB8 The record number from FileB.
@RECB9 See Note on page 13-25.
@RECB10
@EXACT The exact match flag for fields that had values and
matched exactly.
@LR The left/right match flag; only for Geomatch runs.
@PASS The pass number of the Match run.
@SET8 The set number of this group of records. A master record
@SET9 and all of its duplicates receive the same number.
@SET10 Residuals receive unique set numbers.
See Notes below.

13-24 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Extract

Argument Description
@SEQNUM A unique sequence number for this match. In Geomatch
runs, each match receives a unique sequence number
Position The location on Line A or Line B for the variable. This line is not
displayed for DUPB and RESB extract types.
Column The output Column position of the variable on Line A or Line B. The
first column is column 1, and reports can be 150 characters long.

Note: The number appended to SET, RECA, and RECB indicates the
number of bytes allocated. For example, SET10 uses 10 bytes. It
is strongly recommended that when processing more than 100
million records, you use either 9- or 10-byte variables.

The destination of MOVE is always the next available output position


on the record. Match generates and writes a record layout for the
extract to a .RLO file named for the job. The record layout includes the
starting position and length of each column for the specified fields.

Note: If you want to insert a space between columns, use the


MOVE-Literal statement and use a character space (or as many
spaces as you wish to insert between columns) as the argument
for the literal — MOVE “ ”.

MOVE-FieldName
The MOVE-FieldName statement sets up the record format by moving
field values to the output record’s Line A. The MOVE-FieldName
statement uses the following format:

QualityStage Designer User Guide 13-25


13 WORKING WITH MATCH REPORTS
Customizing a Match Extract

MOVE fieldname OF A | OF B column [length]

Argument Description
fieldname You can move an array or field. To do so click the Datafield A
or Datafield B buttons and select the field from the display
dialog box. To insert array fields click the Arrayfield A or
Arrayfield B buttons and select the array from the display
dialog box.You can select any field from FileA and FileB, even
if you did not use that field for blocking or matching.
You can also select array fields from FileA or FileB.
OF A FileA for the fieldname.
OF B FileB for the fieldname.
column The output Column position of the field on line A. The first
column is column 1, and report lines can be 150 characters
long.
length The Length of the field to be displayed. The default is the
defined length of the field.

Note: FileA and FileB move the data, but not the field definitions. If you
want these field definitions for the Survive stage, you need to
copy the field definitions before you run the extract.

MOVELR
The MOVELR statement moves a field to the output record’s Line B,
depending on whether the left or right flag is set from any interval
comparison for parity match. You use this statement when executing
Geomatch (or Match) runs against a Census Bureau Tiger reference
file or similar file. The MOVELR statement uses the following format:

13-26 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Customizing a Match Extract

MOVELR left-field right-field OF B

Argument Description
left-field The Left Field from FileB only if matched to the left interval.
right-field The Right Field from FileB only if matched to the right
interval.
column The output Column position of the field on line B. The first
column is column 1, and report lines can be 150 characters
long.
length The Length of the field to be displayed. The default is the
defined length of the field.

An example of MOVE statements:


MOVE @WGT
MOVE LAST_NAME OF A
MOVE FIRST_NAME OF A
MOVE LNAME OF B
MOVE FNAME OF B
MOVE "This is a test"

Note: If you want to insert a space between columns, use MOVE “ ”—


delimited spaces.

In this example, the output record consists of:

The weight 8276


The LAST_NAME from FileA DUGGAN
The FIRST_NAME from FileA MARY
The LNAME from FileB DUGGAN
The FNAME from FileB MARY
A literal "This is a
test"
The record would look like:
8276DUGGANMARYDUGGANMARYThis is a test
MOVEALL

QualityStage Designer User Guide 13-27


13 WORKING WITH MATCH REPORTS
Using the Statistics Report

The MOVEALL statement moves an entire record to the output file.


Use one of the two formats:

Statement Description
MOVEALL OF A Moves the entire FileA record to the output file.
MOVEALL OF B Moves the entire FileB record to the output file.

Using the Statistics Report


Another tool that you can use in the evaluation stage is the statistics
report. It provides:
• Run information on the match pass.
• Frequency information based on the weights for matched fields.
• Summary statistics.
• Histograms of weights.
To generate this report, specify the base filename in the Match
Settings dialog box. See “Setting Match Parameters” on page 12-48 for
details.

Run Information Section


The run information section provides the:
• Project name.
• Type of Match process, such as MATCH or GEOMATCH.
• Number of the pass.
• Key size—the total size in characters of the blocking variables for
this pass.
• Names of the fields specified for the blocking variables.
• The cutoff weights specified for the match, clerical, and duplicate
cutoff point.

13-28 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Using the Statistics Report

• A listing of the match process, including the fields, u-probability,


m-probability, and parameters specified for the matching
variables.

Frequency Information Section


The frequency information section provides for each matching
variable a table listing the values of each field that was matched. This
table includes the following information:

Item Description
VALUE The value whose weights are reported on this
line.
FREQ The number of times this value has occurred
on both files.
MAgree The number of times the value has agreed in
matched pairs. For the initial run, this value
is zero.
MPart The number of times the value has
participated in a matched pair. For the initial
run, this value is zero.
mprob The calculated m probability used for this
value. The m probability is the accuracy of
the particular value or 0.9 means there is a
10% error for this value in the matched
records.
uprob The calculated u-probability used for this
value. This probability is based on the
frequency counts.
type How the m-probability is determined, where
D (default) and derived from a user-supplied
value.

QualityStage Designer User Guide 13-29


13 WORKING WITH MATCH REPORTS
Using the Statistics Report

Item Description
AGR WGT The agreement weight, the weight assigned
to a match on this value. The rare values
have higher weights than the more common
values.
DIS WGT The disagreement weight, the weight
assigned if the value appears on one of the
files and disagrees. Since two values are
involved in a disagreement, the matcher uses
the value from the table with the highest
frequency (most common value).
MISS WGT If one or both values are missing, the missing
weight is used. This is the midpoint between
the global agreement and disagreement
weights. This is printed before the individual
values for the variable are listed.

Summary Statistics Section


The summary statistics section summarizes the activity that occurred
during the match pass. This summary includes among other statistics:
• The number of records read on both files.
• The number of times the blocking variables have agreed.
• The number of residuals or records on each file that remain
unmatched in both files.
A record is skipped when the blocking variables do not agree. Records
are also skipped when there is a block overflow, which occurs when
the number of comparisons exceeds the maximum memory available.
The affected records on both files are skipped. You should keep block
sizes small enough so that block overflow does not happen. However, if
it does, subsequent passes should match these records. All skipped
records become residuals.
You can change the amount of memory allocated by Match. See
“Setting Match Parameters” on page 12-48 for instructions.

13-30 QualityStage Designer User Guide


WORKING WITH MATCH REPORTS
Using the Statistics Report

Histogram Section
Two histograms of the matching results are printed at the end of each
run. The first histogram shows the distribution of weights for all
comparisons. The second histogram shows the distribution of weights
for only paired records (that is, match, clerical, and duplicates).
The histogram shows how the weights are distributed for all
comparisons. The higher the weight, the more you can be confident of
a correct match. Each line indicates the weight (in 0.5 increments),
the frequency, and a graphic representation. Lines ending with a right
bracket (>) exceed the range.
You can use the histogram to decide where the cutoff values should be.
Notice in the example below, the match cases trail off around 6. Below
this cutoff there are some bumps. For this example, you want to make
the clerical review cutoff 6 and the unmatched cutoff 4.
An example is:
*
* HISTOGRAM
*
* Distribution of observed weights for all comparisons
* Scale based on mean frequency of: 8
*
* For weights with a frequency greater than the mean -
* The histogram shows an arrow in the last column
*
* WGT Freq
*
* 4.00 0
* 4.50 0
* 5.00 2 **
* 5.50 2 **
* 6.00 1*
* 6.50 1*
* 7.00 1*
* 7.50 1*
* 8.00 1*
* 8.50 2 **
* 9.00 2 **
* 9.50 1*
* 10.00 7 *******

QualityStage Designer User Guide 13-31


13 WORKING WITH MATCH REPORTS
Viewing Extracts and Reports

* 10.50 3 ***
* 11.00 8 ********
* 11.50 11 ***********
* 12.00 9 *********
* 12.50 2 **
* 13.00 7 *******
*
* TOTAL COMPARISONS: 0000307
The frequencies for unmatched cases trail off as the weights go higher
and the frequencies for matched cases trail off as the weights go lower.
This forms two curves (or modes). These represent the unmatched and
the matched cases. The farther apart these are from each other, the
better the discrimination between the matched and unmatched
records. Try to draw a continuous curve from the histogram chart, and
examine the tails of the curve to decide where to make the cutoff
points.

Viewing Extracts and Reports


For information on viewing your extracts and reports see “Using Stage
Wizards to Prepare Data for Predefined QualityStage Reports” on
page 15-2 and “Using the QualityStage Data File and Report Viewer”
on page 16-1.

13-32 QualityStage Designer User Guide


14

Defining Survive Stages

Survivorship is the third step in Phase Three of the data


re-engineering workflow. Remember, Phase Three is about designing
and developing the data re-engineering application. Following the
four-stage process discussed in Chapter 2 will streamline your data
re-engineering implementation.

Figure 14-1 Phase Three: Design and Develop the Data Re-engineering
Application

Defining survivorship:
• Resolves multiple occurrences of records
• Defines the format of the surviving data

QualityStage Designer User Guide 14-1


14 DEFINING SURVIVE STAGES
Using the Survive Stage

The appropriate resolution depends on:


• The business processes that use this data
• Your company’s business rules and data quality management
standards
You determine:
• Which records from the source data survive
• How that data is formatted
QualityStage includes the Survive stage for specifying survivorship
and data format criteria. This job can link duplicate records together
with a common key, or combine the data from multiple records to form
a single record.
This chapter explains how to use the Survive stage. It assumes that
you identified matching records as described in Chapter 12, “Defining
Match Stages”.

Using the Survive Stage


With the Survive stage, you specify which fields and field values from
the group of input records will create the output record. The output
record can include:
• An entire input record
• Selected fields from the record
• Selected fields from different records in the group

Fields and You select field values based on rules for testing the fields. A rule
field values contains a set of conditions and a list of target fields. If a field tests
true against the conditions, the field value for that record becomes the
best candidate for the target. After testing each record in the group,
the fields declared best candidates combine to become the output
record for the group. Whether a field survives is determined by the
target. Whether a field value survives is determined by the rules.
Some approaches to selecting the best candidate are:
• Duration of record

14-2 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Using the Survive Stage

• File from which the record originated (source)


• Length of data in the field
• Frequency of data in a group
To use a Survive stage, you perform the following general tasks:

1. Specify an input data file and an output data file.


2. Define the data file fields to write to the output record — the
target.
3. Decide whether to configure the Survive stage with:
• A predefined Technique
• A custom rule
4. Run the Survive stage to the data file.

Grouping Records
You use the Survive stage often after unduplicating a data file in
which you have identified groups of records that contain similar or
identical data. For example, the following portion of three records has
been identified as the same person using different representations for
the name:
JOHN SMITH JR

JOHNNY SMITHE

JOHN E SMITH

Group identifier Once you identify each group of records, you must assign an identifier
to the group. For example, the Match stage lets you include a set
number with each record in a group when you extract the data. See
“Defining a Match Stage” on page 12-11 for more information.

Sorting grouped The Survive stage sorts your input data file on the group identifier to
records ensure that all records in the group are together. However, you cannot
control the order in which records appear in a group.

QualityStage Designer User Guide 14-3


14 DEFINING SURVIVE STAGES
Creating a Survive Stage

Defining Survive Stage Files


You define one input file and one results file for a Survive stage.

Defining the Input File


The data in the input file must be in fixed-format fields. The fields you
must define are the targets. You can include more than one field in a
target.
You must also define any fields that you want to use in testing with a
rule.
At least one field must include a group identifier.

Note: The field use type for each field in your input file must be
specified as either a string or an integer.

Defining the Results File


The results file definition must include at least the fields you defined
for the targets in the input data file definition.
To create this file, you can use the Data File Wizard to copy your input
file definition to the output file definition.

Creating a Survive Stage


Creating a Survive stage requires you to:
1. Define a new Survive stage, which specifies the input and output
files.
2. Use the Survive Rules Definition screen to define the target.
Adding a survivorship rule entails one of the following:
• Using a predefined Technique
• Creating a new custom rule

14-4 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating a Survive Stage

3. Use the Survive Rules Definition screen to define the survivorship


rule you need:
• Define a simple rule using a predefined Technique.
• Define a complex rule using the Survive Rule Expression.
Builder.
4. Add the newly defined rule to the Survive Rules list.
The following sections describe these steps in detail.

Using the Survive Stage Wizard


To create a Survive stage:
1. On the left pane of the QualityStage main window, select a Stages
folder.
2. Do one of the following:

• From the Toolbar, choose ➤ Stage ➤ Survive.


• Right-click anywhere on the right pane, click New Stage, and
then click Survive.

QualityStage Designer User Guide 14-5


14 DEFINING SURVIVE STAGES
Creating a Survive Stage

The Survive Stage Wizard appears.

3. Under Name, enter up to eight alphanumeric characters.


4. Under Description, enter up to 40 characters.
5. Under Options, select one of the following:
• Pre-Sort Input Datafile. This option lets you sort your input
data files before the stage runs.
• Do Not Pre-Sort Input Datafile. This option lets you not sort
your reference data files before the stage runs. If your file is
already appropriately sorted, this can speed up processing
time.

14-6 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating a Survive Stage

6. (Optional) If you want to generate a predefined QualityStage


report for this job, select the Map Input Fields for Report Use
check box.
For information about predefined QualityStage reports, see
“Using Stage Wizards to Prepare Data for Predefined
QualityStage Reports” on page 15-2.
7. Under Data File, select the input file.
8. Under Results File, select the results file.
The Data File must be a different file from the Results File.
9. Click Next.

Survive Wizard The Survive Wizard appears:

10. Select the Field Name that contains the group identifier you want.
11. Click Next.
The Survivorship Rules Definition Screen – SURVIVE appears.

QualityStage Designer User Guide 14-7


14 DEFINING SURVIVE STAGES
Creating a Survive Stage

Defining Survive Rules


The Survivorship Rules Definition Screen – SURVIVE lets you easily
add, delete, modify, and manage survivorship rules.

Defining Targets
Targets are the data file fields you want to write to the output record.
Fields you do not include as targets are excluded from the output
record. You can easily create targets for individual fields, groups of
fields, or the entire record using the Specify Output Fields area of the
Survivorship Rules Definition Screen – SURVIVE.

14-8 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating a Survive Stage

To define a target:

Under Available Fields, drag each Field Name you want to include to
the Targets list, or use or to add or remove a selected field
from the Targets list.

Defining a Simple Rule


You can quickly and easily add simple rules using Techniques.
Techniques are commonly used survivorship rules that are
compressed into a single, descriptive name.
The following is a list of available Techniques and their associated
pattern, where <field> is the field to be analyzed and <DATA> is the
contents of the Data column:

Technique Pattern
Shortest Field SIZEOF(TRIM(c.<field>)) <=
SIZEOF(TRIM(b.<field>))
Longest Field SIZEOF(TRIM(c.<field>)) >=
SIZEOF(TRIM(b.<field>))
Most Frequent FREQUENCY

QualityStage Designer User Guide 14-9


14 DEFINING SURVIVE STAGES
Creating a Survive Stage

Technique Pattern
Most Frequent FREQUENCY
[Non-blank] (Skips missing values when counting most frequent.)
Equals c.<field> = <DATA>
Not Equals c.<field> <> <DATA>
Greater Than c.<field> >= <DATA>
Less Than c.<field> <= <DATA>
At Least One 1
(At least one record survives, regardless of other
rules.)

For example, the Longest Field technique on field LASTNAM is the


same as the survivorship language rule:
SIZEOF(TRIM(c.LASTNAM)) > SIZEOF(TRIM(b.LASTNAM));
For information about correct syntax and rules processing, see
“Creating Rules Syntax” on page 14-23.
To define a simple rule:

1. Under Survive Rule, select Analyze Field.


2. From the Analyze Field list, select the field you want to analyze. If
you select (Use Target), it is assumed that you want to use the

14-10 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating a Survive Stage

Target. If there are multiple target output fields, an error message


appears if you try to add the rule.
3. From the Technique list, select the Technique to apply to the
selected Field.
4. In the Data box, enter the value, field name, or expression.

Tip: You can use standard Windows shortcut keys for copy, cut, and
paste commands (Ctrl+C, Ctrl+X, and Ctrl+V) on the text in the
Data box.

If you need to create a more complex multifield or multiclause


survivorship rule than that provided by the available Techniques, you
must create a complex rule. See “Defining a Complex Rule” on page
14-11.
To add the rule, see “Adding the Rule” on page 14-13.

Defining a Complex Rule


To define a new complex rule, select Complex Survivorship Expression
under Survivorship Rule.

To enter the expression of a complex rule, type directly in the Complex


Survivorship box, or click Expression Builder to use the Survivorship

QualityStage Designer User Guide 14-11


14 DEFINING SURVIVE STAGES
Creating a Survive Stage

Rule Expression Builder screen to help you (for details see “Using the
Survivorship Rule Expression Builder” on page 14-12).
For information about correct syntax and rules processing, see
“Creating Rules Syntax” on page 14-23.

Tip: You can use standard Windows shortcut keys for copy, cut, and
paste commands (Ctrl+C, Ctrl+X, and Ctrl+V) on the text within
the Complex Survivorship Expression box.

Using the Survivorship Rule Expression Builder


The Survivorship Rule Expression Builder screen lets you create or
edit complex expressions and rules more easily:

1. In the Expression box, enter and edit the text.

14-12 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating a Survive Stage

To quickly insert a Field, Function, or Stage at the current text


cursor position in the Expression box, double-click the item in the
respective list.

Note: Remember, under Fields the prefix c. indicates the current


record, the prefix b. indicates the best record.

Tip: You can use standard Windows shortcut keys for copy, cut,
and paste commands (Ctrl+C, Ctrl+X, and Ctrl+V) on the text
within the Expression box.

2. To ensure that your expression uses correct syntax, click Check


Expression to invoke the expression check tool.
If a problem is found, a message box appears indicating what
problem was found and at what character position the checker
believes the problem occurred. The checker then automatically
sets the text cursor at that location in the Expression box.
3. Edit the rule, if needed, to correct the problem and repeat Check
Expression.
4. Do one of the following:
• Click OK to save your changes.
As a safeguard, the expression checker automatically checks
the rule to ensure that it is syntactically correct before placing
it into the appropriate place in the Survivorship Rules
Definition Screen.
• Click Cancel to discard all changes made in this screen and
return to the Survivorship Rules Definition Screen.

Adding the Rule


To add a newly defined rule to the Survivorship Rules list at the
bottom of the screen, click Add Survivorship Rule.
After you finish adding to or modifying the Survivorship Rules list, do
one of the following:
• Click Finish to save all changes and return to the Stage Wizard.

QualityStage Designer User Guide 14-13


14 DEFINING SURVIVE STAGES
Creating a Survive Stage

• Click Next if you selected Map Input Fields for Report Use, to
display the Data Selection for Reports dialog box (see “Selecting
Data for a Predefined QualityStage Report” on page 14-15).
• Click Cancel to discard all changes made in this screen and return
to the Stage Wizard screen.

Important: If you click Finish with a partially completed rule, a warning


message appears. Do one of the following:
• Click No to return to the Survivorship Rules Definition Screen. You
can now finish defining the rule and add it to the Survivorship
Rules list.
• Click Yes to discard the unfinished rule and exit.
A warning message also appears if you try to exit after creating a rule
that has nonsensical elements in it (for example, a field-level
survivorship coming after a record-level survivorship).
See “Modifying and Maintaining Survivorship Rules” on page 14-16.

Tip: See the “Creating rules similar to existing rules” on page 14-17.

14-14 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating a Survive Stage

Selecting Data for a Predefined QualityStage Report


The Data Selection for Reports Dialog Box looks like this:

1. Select the group identifier field, and then click to move the
field name next to Match Type.
2. Select the record number field, and then click to move the field
name next to Set Number.
3. Repeat steps 1 and 2 for up to six Name fields, up to six Address
fields, and up to six Area fields.
4. When all desired fields are selected, click Finish.

QualityStage Designer User Guide 14-15


14 DEFINING SURVIVE STAGES
Creating a Survive Stage

Modifying and Maintaining Survivorship Rules


The Survivorship Rules grid at the bottom of the screen lets you easily
maintain your rules. Using this grid, you can:
• Delete rules
• Modify rules
• Reorder the sequence
• Create new rules from existing ones
The gray column to the left of the grid is the selector column.

To work on a single rule, click anywhere in that rule’s row to select it,
or click on its selector box to highlight that row.
To select a group of consecutive rules, click on the first rule’s selector
box. Then Shift+Click on the last rule’s selector box; if necessary, use
the grid’s scroll bars to display the last rule before you Shift+Click.
To select a group of nonconsecutive rules, Ctrl+Click the selector box
of each rule you want to include.

Deleting rules To delete rules:

1. Select the rules to delete.


2. Click Delete Survivorship Rule to remove the selected rules from
the list.

Using the grid to To modify an existing rule using the grid:


modify a rule
1. If you need to change the Target, Analyze Field, or Technique
value, click the rule’s column to select a new value from the list.

14-16 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating a Survive Stage

Note: Changes are occasionally made to the grid when it is edited.


For example, if you modify a rule that has a complex
expression and then change its Technique to Shortest, the
complex expression no longer makes sense in this context.
The expression is cleared from the Data cell, and the Analyze
Field column is set to the Target.

2. To modify a rule’s Data entry, click on that cell. A button


appears at the right of the cell.
• Edit the entry directly within the cell, or
• Click to use the Survivorship Rule Expression Builder
screen to edit the expression.
See “Using the Survivorship Rule Expression Builder” on page
14-12.

Using To modify an existing rule using the Survivorship Rule section at the
Survivorship Rule upper right of the screen:
to modify a rule
1. Select the rule you want to modify.
2. Click Edit Survivorship Rule to temporarily move that rule from
the grid to the Survivorship Rule section.
3. Edit the rule as needed.
• Edit the entry directly within the text box, or
• Click Expression Builder to use the Survivorship Rule
Expression Builder screen to edit the expression.
See “Using the Survivorship Rule Expression Builder” on page
14-12.
4. Click Add Survivorship Rule to move the rule back to the
Survivorship Rules grid list.

Creating rules To create rules similar to existing ones:


similar to existing
rules 1. Select the rules you want to duplicate.
2. Click Copy Survivorship Rule to add the duplicates to the bottom
of the list.
3. Edit each duplicated rule to create new rules.

QualityStage Designer User Guide 14-17


14 DEFINING SURVIVE STAGES
Running Survive Jobs

Reordering rules To reorder the sequence of rules:

1. Select the rules you want to move.


2. Click Move Up or Move Down as needed to move the selected rules
to the desired location.

Tip: You can easily reorder rules by using drag and drop. As you drag,
green and red arrows appear that indicate the position at which
the dragged record is inserted when you release the mouse
button.

Running Survive Jobs


Add the stage Before you can use your Survive stage, you must add it to a job. You
to a job can either:
• Add your Survive stage to an existing job, or
• Add a new job, and then add your Survive stage to it.
For information about adding new jobs, see “Creating a New Job” on
page 6-4. For information about adding stages to jobs, see “Adding
Existing Stages to a Job” on page 6-6.
After adding your Survive stage to a job, you can run it.

Setting up If this is the first job that you have run in the project, you need to set
the file structure up the file structure for the project. See Chapter 5, “Setting Up Run
Profiles”, for instructions.
Once your data files are available to QualityStage, run the Survive
stage:

How to run To run a Survive job:


the job
1. On the left pane of the QualityStage main window, select a Jobs
folder.
2. From the jobs list on the right pane, select the job you want to run.

14-18 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Running Survive Jobs

3. Do one of the following:

• On the Toolbar, click .


• Right-click on the job you want to run, and then click Run.
The Job Run Options dialog box appears. It looks like this:

4. Select a run profile from the Profile list.


If you have not defined a run profile, click Setup. For information
about how to set up run profiles, see Chapter 5, “Setting Up Run
Profiles”.
5. Under Select Run Options, select Deploy, Run, or both.
See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.

QualityStage Designer User Guide 14-19


14 DEFINING SURVIVE STAGES
Running Survive Jobs

6. (Optional) If you want to run a QualityStage formatted report


after you run the job, do the following:
a. Select Prepare Report Data.
This specifies that prepared report data output will be put in
the Data directory for the project.
b. Select Retrieve Report Data, and specify the maximum file
size to retrieve.
The output file will be copied to the location specified in the
run profile for local report data.
See Chapter 15, “Working with QualityStage Reports”, for more
information about preparing data for formatted reports.
7. (Optional) Click Advanced Run Options to see other options you
can set, depending upon:
• The job you are running
• The profile you are using
Select the options you need. For information about the advanced
run options, see “Advanced Run Options” on page 8-3.

Run mode 8. Click one of the following:


• Execute Data Stream Mode
• Execute File Mode
• Execute Parallel Extender Mode
See “About Run Modes” on page 7-2 for more information on the
three run modes.

Note: In Parallel Extender mode, if you intend to run a project built


with an earlier version of QualityStage (or INTEGRITY), you
must deploy the project using Parallel Extender mode before
you can run it.

14-20 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Running Survive Jobs

Running in Data Stream Mode or Parallel Extender Mode


If you clicked either:
• Execute Data Stream Mode, or
• Execute Parallel Extender Mode
the Job Run Options dialog box closes.
If you selected Wait for Completion or are using a Windows local
server, status messages appear in the Status window.
When the job finishes running, a message like this one appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

QualityStage Designer User Guide 14-21


14 DEFINING SURVIVE STAGES
Running Survive Jobs

Running in File Mode


If you clicked Execute File Mode, the File Mode Execution screen
appears. It looks like this:

The File Mode Execution dialog box lists all the stages in the job. You
must run the same stages that you selected when deploying the job.
See “Deploying Jobs in File Mode” on page 7-7 for more information.
By default, all stages listed are run from first to last. However, you
can select a subset of stages to run. For information about how to
select a subset, see “To select a subset” on page 8-7.

1. Under Select Run Options:


a. Select Deploy, Run, or both.

14-22 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating Rules Syntax

See “About Deploying and Running Jobs” on page 7-1 for more
information about deploying and running jobs.
b. (Optional) Select Prepare Report Data.
This specifies that prepared report data output will be put in
the Data directory for the project.
c. (Optional) Select Retrieve Report Data, and specify the
maximum file size to retrieve.
The output file will be copied to the location specified in the
run profile for local report data.
See Chapter 15, “Working with QualityStage Reports”, for more
information about preparing data for formatted reports.

To run the job 2. Click Run From Start to End.


The progress of the run is noted in the Status box.
3. When the run has successfully finished, a message like this one
appears:

After processing is finished, you can view the results. See “Viewing
Job Output Files” on page 8-13 and “Working with QualityStage
Reports” on page 15-1 for more information.

Creating Rules Syntax


A rule comprises one or more targets and a conditional expression that
must be true for the target to be considered the best candidates for the
output record. A condition is made up of:
• Field names
• Constant or literal values
• Operators that specify comparison or arithmetic operations

QualityStage Designer User Guide 14-23


14 DEFINING SURVIVE STAGES
Creating Rules Syntax

You can create more than one rule for a target.


You can use any field defined in your input data file for testing the
condition. The field can be the target or another field that is not
associated with the target.
The field type, either string or integer, determines which operators
you can use in your rules. You must specify the type of field using
either the Add a New Datafield or Modify a Datafield dialog box from
the Data Field Wizard. You select either String or Integer under Field
Use Type.

Rule Format
The syntax for a rule is:
TARGETS: CONDITION;

The format for a rule is:


• Multiple targets are permitted for the same rule. A space separates
individual targets in a list; a colon (:) separates the list of targets
from the condition.
• Only one condition is permitted in a rule.
• Every field name must be prefixed with either a c. to indicate the
current record or b. to indicate the best record.
• Parentheses ( ) can be used for grouping complex conditions.
• Integer constants are indicated with the number, such as 7.
• String literals are enclosed in double quotation marks, such as
“MARS”.
• A rule can extend over several lines.
• A semicolon (;) terminates a rule.

14-24 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating Rules Syntax

Rule Operators
Survive supports the following comparison operators for both string
and integer fields:

= Equal to

<> or != Not equal to

> Greater than

< Less than

>= Greater than or equal to

<= Less than or equal to

Survive supports the following arithmetic operators for integer fields:

+ Add
– Subtract

* Multiple

/ Integer division, which drops the remainder (9


/ 4 = 2)

% Modulo division, which evaluates the


remainder
(9 % 4 = 1)

Survive supports the following logical operators:


AND Binary logical “and” (expression1 AND
expression2 is true if both expressions are
true).
OR Binary logical “or” (expression1 OR expression2
is true if either expression is true).
NOT Unary logical “not” (NOT expression1 is true if
expression1 is false).

QualityStage Designer User Guide 14-25


14 DEFINING SURVIVE STAGES
Creating Rules Syntax

Survive also supports the following functions:


SIZEOF The number of characters, including spaces in
a string-type; the number of decimal digits in
an integer-type field
TRIM Strip leading and trailing spaces from
string-type fields
FREQUENCY The most frequent value. Note the following
conditions:
• You cannot use FREQUENCY with other
expressions (no AND or OR operators).
• You cannot specify tie-breaker conditions;
the last record in a group is selected.
• The data must be normalized since
trimmed values are not used; thus ‘ apple ‘ is
not the same as ‘apple’.

Rule Processing
The Survive stage reads the first record in a group and evaluates the
record against all the rules. The fields for this record are the current
fields. For the first record in a group, there are no best fields. All rules
for each target are evaluated against the fields in the record. If any
target passes the test, its data fields become the best fields.
The job evaluates each subsequent record in the group, which are the
current records during the evaluation. When a target passes the test,
its data fields become the best fields, replacing any existing best
fields. If none of the current fields meet the conditions, the best fields
remain unchanged.
After all records in the group are evaluated, the values designated as
the best values combine to become the output record. Survive
continues the process with the next group.

14-26 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating Rules Syntax

For example, the following rule states that FIELD3 of the current
record should be retained if the field contains five or more characters
and FIELD1 has any contents.
FIELD3: (SIZEOF (TRIM c.FIELD3) >= 5) AND (SIZEOF (TRIM
c.FIELD1) > 0) ;
The first group has the following three records:

No. of characters in
Record FIELD1 FIELD3
1 3 2
2 5 7
3 7 5

The first record in the group has two characters in FIELD3 and three
characters in FIELD1. This record fails the test, because FIELD3 has
fewer than 5 characters. The next record has seven characters in
FIELD3 and five in FIELD1. This current record passes the conditions for
this rule. The current FIELD3 (from the second record) becomes the
best field. The third record with five characters in FIELD3 and five in
FIELD1 also passes the conditions, and FIELD3 from this record replaces
the best value as the new best value.
When you define multiple rules for the same target, the rule that
appears later in the list of rules has precedence. For example, if you
have two rules for the target FIELD1, the value from the record that
meets the conditions of the second rule as listed becomes the best
value. If no target passes the second rule, the best values from the
first rule becomes part of the output record.

Examples of Rule Syntax


This section describes the syntax for some typical Survive rules.

QualityStage Designer User Guide 14-27


14 DEFINING SURVIVE STAGES
Creating Rules Syntax

First Record in Group Survives


If you want to assign the first record in the group automatically as the
best record, you must define a field in your input file (and in your
results file) for the entire record. You can then define the following
rule (assuming you named the field RECORD):
RECORD : SIZEOF (TRIM B.RECORD) = 0) AND (SIZEOF(TRIM
C.RECORD) > =0);
This rule should appear before any other rules surviving RECORD.

At Least One from Each Group Survives


If you want to make sure that at least one record in the group
survives, you can define the following rule (assuming you named the
field RECORD), which selects the last record in the file to survive if no
other record is selected:
RECORD : 1 ;

Date as a Target
You compare the field for the greater value in the following way:
DATE : c.YEAR > b.YEAR

Multiple Targets
In the following example the rule has two target fields, NAME and
PHONE. The current values should survive if the current year is
greater than the best year:
NAME PHONE : c.YEAR > b.YEAR ;

Using the File from Which the Record Originated


To use the file from which the record originated, you assign each
record a file identifier indicating the file ID. You then define the
condition for that field.

14-28 QualityStage Designer User Guide


DEFINING SURVIVE STAGES
Creating Rules Syntax

Using the Length of Data


To use the length of data in the field, the field must be a string field
use type. You use the SIZEOF operator. You might also want to use the
TRIM operator to remove leading and trailing spaces; for example:
TARGET : ( SIZEOF (TRIM c.FIELD1) > SIZEOF (TRIM
b.FIELD1) ) ;
In this example, the rule causes the first name and middle name to
survive based on the longest first name:
FNNAMES MNNAMES: (SIZEOF (TRIM C.FNNAMES) > SIZEOF (TRIM
B.FNNAMES));

Using Frequency
To specify that the most frequent value of a field within each group
survives you first specify the field followed by the keyword
FREQUENCY separated by a colon, as in this example:
FIELD1: FREQUENCY

Multiple Rules
If you have multiple rules for a surviving field, the value that satisfies
the later rule survives. In the following example, RECORD is the entire
record, and TYPE, FREQ, FIRSTACC, and V9 are fields in the record.
Using the following rules:
RECORD : (c.TYPE<>”DD”) ;
RECORD : (c.FREQ>b.FREQ) ;
RECORD : (c.FIRSTACC = 1) ;
V9: (c.V9 > b.V9) ;
the following is true:
• If a record satisfies the last rule for the target RECORD, that is, the
value for field FIRSTACC is 1, the record becomes the best record
(b.RECORD).
• If more than one record in the group passes the last rule, the latest
record processed that satisfies the rule survives.

QualityStage Designer User Guide 14-29


14 DEFINING SURVIVE STAGES
Creating Rules Syntax

• If no records pass the FIRSTACC rule, the last record processed that
passes the c.FREQ>b.FREQ rule survives.
• If no records pass the FREQ rule, the last record processed that
passes the c.TYPE<>”DD” rule survives.
• If no records pass any of the rules, the surviving record is all
blanks.
However, in this set of rules there is a rule for one of the fields (V9) in
the RECORD to survive. Since the V9 rule appears later in the list of
rules than the rules for RECORD, it takes precedence over whatever
value survives for that field in RECORD.
In the following example, you have three records in a group with the
following field values:

TYPE FREQ FIRSTACC V9


MD 3 2 19990401
DD 4 1 19990314
DN 5 4 19980302

In this example the following output record survives:

DD 4 1 19990401

The second input record survives the rule for the RECORD target,
because FIRSTACC=1, but the first input record provides the surviving
value for V9. If the FIRSTACC were not equal to 1 for any of these
records, the third record would survive the RECORD target since it has
the highest value for FREQ.

14-30 QualityStage Designer User Guide


15

Working with QualityStage


Reports

You can view input and output data from your jobs using:
• QualityStage Reports
• QualityStage Data File and Report Viewer
Chapter 16 describes how to use the QualityStage Data File and
Report Viewer.
This chapter describes how to work with QualityStage Reports. It
describes:
• How to use stage wizards to prepare data for a QualityStage Report
• How to create and run QualityStage Reports using unprepared
data
• How to create customized Access reports
• How to generate and view QualityStage Reports
• How to select and run a QualityStage Report
The final section of this chapter describes each of the predefined
QualityStage reports.

QualityStage Designer User Guide 15-1


15 WORKING WITH QUALITYSTAGE REPORTS
Using Stage Wizards to Prepare Data for Predefined QualityStage Reports

Using Stage Wizards to Prepare Data for Predefined QualityStage


Reports
The QualityStage Report feature lets you create and run predefined
reports using output data from stages in jobs that you run.
QualityStage uses a Microsoft Access 2000 relational database to
create your reports.
QualityStage is shipped with a Reports database that contains a set of
predefined reports.
When you define an Investigate, Standardize, Match, or Survive
stage, you prepare your report data as part of the job definition.
The File Mode Execution window also lets you set report data options.
You prepare data for a QualityStage report when you are using the
stage wizards to define a stage. For information about preparing data
for predefined QualityStage reports, see the following chapters:
• Chapter 9, “Defining Investigate Stages”
• Chapter 10, “Defining Standardize Stages”
• Chapter 12, “Defining Match Stages”
• Chapter 14, “Defining Survive Stages”

Creating and Running QualityStage Reports Using


Unprepared Data
The instructions to create reports that are integrated into
QualityStage assume that the data has already been created and is
available to QualityStage.
• Flat file data must be in text-delimited form. You must know the
name of each field.
• Data in relational databases must have been loaded into tables
that you created in the database. If multiple tables are used in a
single report, you must create a relationship between the tables.

15-2 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
Creating and Running QualityStage Reports Using Unprepared Data

Preparing Your Data


The Reports database uses data from files in one of the following
locations:
• An ODBC relational database
• An Access relational database
• A text-delimited file (flat file)

Converting Output Files


You must convert QualityStage output files into one of the following:
• Flat files
• ODBC data files
Use the Format Convert stage to do this. For more information about
using this stage, see the QualityStage Stages Reference Guide.

Adding a .txt Extension to Flat Files


All flat files must have a .txt extension. We recommend that you use
the Program stage to append this extension. For more information
about using this stage, see the QualityStage Stages Reference Guide.

File Location
Since Microsoft Access must link to the input data, it must be
accessible to the QualityStage Designer host either locally or
remotely.
Access retrieves all remote data across your network to generate the
report. Network performance affects the speed with which Access
generates the report.

QualityStage Designer User Guide 15-3


15 WORKING WITH QUALITYSTAGE REPORTS
Creating Customized Access Reports

Creating Customized Access Reports


In addition to the predefined QualityStage reports, you can create
customized reports and either:
• Add them to the default database, or
• Create a new Reports database.

Note: If you want to create your own customized reports, you must have
installed and be familiar with Microsoft Access 2000. The
instructions in this section for creating customized reports
assume that you are an experienced user of Access.

Designing a Report
The database to design a report can be:
• The Reports database (reports.mdb) shipped with QualityStage, or
• Any database that you create to hold report definitions.
Determine which fields to display in the report before you create the
report in Access.

Creating a Table
Tables that have the same name and fields in the Access reports
database and other relational databases are known as mirror tables. If
the data is in a relational table, the tables that are created must
mirror the structure of the relational tables.
To create a table:

1. Create (do not populate) tables in Access that mirror your data.
If the data is in a flat file, the tables should map to the fix-fielded
version of the data before the data is converted to delimited text.

Important: QualityStage does not support a method for exporting file


metadata from the QualityStage database such that it can be
imported by Access to create an Access table for reports.

15-4 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
Creating Customized Access Reports

2. If multiple tables are used in a single report, create a relationship


among the tables so that each table maps to one data file.
3. Populate the tables with test data.

Creating a Query
Use these instructions if you are familiar with SQL. The query will be
used on the relational database with the populated tables, but
compose the SELECT statement without including an IN clause, as if
the statement were to be used on the local Access database tables. At
run time the tables are linked to the Access Reports database and are
local to the reports database.
To create a query:

1. Select Queries from the Objects list.

2. Click .
3. Select Design View from the New Query dialog box and close the
Show Table dialog box
4. Select Create query in Design view.
5. Change the view of the query to SQL and compose a SELECT
statement.

Creating a Query With the Query Wizard


If you are not familiar with SQL, you can create a query with the
Query Wizard in Access. The tables for the query exist in the local
Access database.
To create a query with the Query Wizard:

1. Select Queries in the Objects field.

2. Click on the Database window toolbar.


3. Select the Simple Query Wizard from the New Query window.

QualityStage Designer User Guide 15-5


15 WORKING WITH QUALITYSTAGE REPORTS
Creating Customized Access Reports

4. Click OK.
5. Select the table or query from the Table/Queries list.
6. Move your selected Available Fields to the Selected Fields.
7. Click Next.
8. Enter a query title in the Simple Query Wizard window.
9. Click the Finish button to complete the Query Wizard job. The
options you selected in the Query Wizard appear in the query that
is generated from your selections.

Creating a Report in the Design View


You can create a report with the Design View of Access if you want to
customize your reports.
To create a report in the Design View:

1. Select Reports from the Objects list in the local Access Reports
database.

2. Click on the Database window toolbar.


3. Select Design View on the New Report dialog box.
4. Select a table or query from the list on the New Report dialog box.
5. Click OK.
6. Select View ➤ Field List to view the list of possible fields to
display on the report after closing properties.
7. You can drag these fields onto the report and organize them as
desired. You can add a report title as well as a page count or a
date.

Creating a Report with the Report Wizard


If you want to create a report with the templates provided with
Access, use the Report wizard.

15-6 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
Creating Customized Access Reports

To create a report with the Report Wizard:

1. Select Reports in the Objects field.

2. Click on the Database window toolbar.


3. Select Report Wizard on the New Report dialog box.
4. Select a table or query from the list.
5. Click OK.
6. Select the table or query from the Table/Queries list on the Report
Wizard dialog box.
7. Move your selected Available Fields to the Selected Fields.
8. Click Next.
9. Select the group options, sort options, layout options, report style,
and report title from the Report Wizard dialog box.
10. Click Finish to complete the Report Wizard job. The options you
selected in the Report Wizard appear in the report that is
generated from your selections.

Testing and Debugging the Report


After you create the report, you can verify that the report displays the
intended data.
To test and debug the report in Access:

1. Click in the Database window toolbar to open the


Report.
2. Close the report after you have examined and verified the
contents.
3. Select Tables in the Objects field.
4. Right-click the table and select Delete to delete the mirror tables
from the Access database.
5. Add Report Integration Information.

QualityStage Designer User Guide 15-7


15 WORKING WITH QUALITYSTAGE REPORTS
Creating Customized Access Reports

Each new report should be entered in the REPORT_TABLES table.


Create one record for each report/table pair. See “Reports
Database” on page 15-14 for more information on REPORT_TABLES
information.

Creating a Specification
For data in flat files, create a specification for the file in Access. The
specification stores the information such as the delimiter type and the
field names needed by Access to link the flat file.
To create a specification for a flat file in Access:
1. Open the Reports database.
2. Select File ➤ Get External Data ➤ Link Tables to open the Link
dialog box.
3. Double-click the flat file for which you would like to create a
Specification.
The Link Text Wizard dialog box opens.
4. Click the Advanced button on the bottom of the Link Text Wizard
dialog box to open the Specification dialog box.
5. Choose the file format. For flat files, choose Delimited, and in
Field Delimiter type choose the delimiter type.
6. Enter the field names of the flat file in the Field Information grid.
Choose the appropriate data type for each field.
7. Once all the field names have been entered, click Save As.
8. Save the specification file as the name of the flat file. For example,
if the flat file is called INPUT, save the specification for that flat file
as INPUT.
9. After you save the specification, click Cancel to exit the Link Text
Wizard.
The specification for that flat file is saved in the Reports database.
10. Exit Microsoft Access.

15-8 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
Generating and Viewing QualityStage Reports

11. Run the report from the Report Generator dialog box. See “When
you define an Investigate, Standardize, Match, or Survive stage,
you prepare your report data as part of the job definition.” on page
15-2 for more information.

Generating and Viewing QualityStage Reports


The Report Generator dialog box lets you locate and run a report for
the data you are looking for within the database.
You must specify:
• The database location or the data location
• The report database containing the predefined reports
• The name of the report you want to generate
You can create and modify reports in Microsoft Access (see “Creating
Customized Access Reports” on page 15-4).
To open the Report Generator dialog box:

1. Select File ➤ Reports.

QualityStage Designer User Guide 15-9


15 WORKING WITH QUALITYSTAGE REPORTS
Generating and Viewing QualityStage Reports

The Report Generator dialog box looks like this:

Note: The Report Generator dialog box defaults to the contents of


the input fields that were last entered.

Specifying the Data Location


Your data can reside on any local or remote relational database or
text-delimited flat file that is accessible to QualityStage Designer.
Select the database option that is appropriate for your data file
configuration:
• ODBC
• Microsoft Access Database
• Flat file

15-10 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
Generating and Viewing QualityStage Reports

ODBC
To access a remote relational database via ODBC:

1. Select ODBC.
The ODBC data source and driver must be preconfigured through
Windows.
2. Enter the Data Source Name (DSN), Username (optional),
Password (optional), and the Database name (required if there is
more than one database on a server).

Microsoft Access Database


To access a remote or local Microsoft Access database:

1. Select Microsoft Access Database.


2. Enter the data file path in the Database Location field if you know
the file location of your database, or click to browse for the file
name in the Open window.

Flat Files
To access a flat file:

1. Select Flat Files.


2. Enter the file location in the Flat Files Directory field.
All the flat files required for one report must be located in the
same directory.
3. Create a specification file in Microsoft Access to provide
information about the flat file so it can be linked. See “Creating a
Specification” on page 15-8 for detailed instructions.
The specification file contains information including the names of
the fields in each flat file and the delimiter type.
Specifications for the flat files for all predefined reports (see
“About Predefined QualityStage Reports” on page 15-14) are

QualityStage Designer User Guide 15-11


15 WORKING WITH QUALITYSTAGE REPORTS
Generating and Viewing QualityStage Reports

located in the default Reports database. These specifications


require that:
• Fields be delimited by tabs, and
• Text be qualified by double quotation marks.

Specifying the Reports Database Location


To specify the location of the Reports database, which contains the
predefined reports delivered with QualityStage, do one of the
following:
• Enter the location of the database into the Reports Database field.

• Click to browse for the file name in the Open window.


By default, the Reports database is located in your QualityStage
Designer directory at:
\Ascential\QualityStageDesigner<version>\Reports.mdb
For example: C:\Ascential\QualityStageDesigner70\Reports.mdb
For more information about the Reports database, see “Reports
Database” on page 15-14.

Selecting and Running a QualityStage Report

Important: You must specify a printer in your Windows Printers folder before
you run a report. Failure to do so prevents processing. Before you
try to run the report, select Settings ➤ Printers from the Start
menu, and then specify a printer.

How to select and 1. From the list of reports, select the report to run. (For a description
run a report of the predefined QualityStage reports, see “About Predefined
QualityStage Reports” on page 15-14.)
2. Click Formatted Reports.

15-12 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
Generating and Viewing QualityStage Reports

When you click Formatted Reports, your data files are linked to
the Reports database, and Microsoft Access generates the report.
After Access finishes running the report, the report opens in
preview mode in Access (not in QualityStage). You can:
• View the report
• Print or export the report
• Close the Access report window
• Run another report
• Close the Report Generator dialog box
3. Click Finish after you run your reports.

Running the report When you generate the same report at another time, QualityStage
again removes the previous links to the data tables in the Reports database
and creates new links according to the data location you specify at this
time. This ensures that the same report does not run with the
incorrect data tables you linked earlier.

Saved Report Entries


When you exit the Report Generator dialog box, QualityStage saves
your entries in a ReportEntries.ini file. The next time the reports screen
appears, QualityStage populates the fields on the Report Generator
dialog box with the entries from this file.
The entries saved in the file are:
• ODBC data source name
• ODBC username
• Location of the Access database
• Flat files directory
• Location of the reports database

QualityStage Designer User Guide 15-13


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

About Predefined QualityStage Reports


This section describes the predefined reports that are delivered with
the QualityStage Designer. These reports can be viewed on the Report
Generator dialog box.
See “How to select and run a report” on page 15-12 for instructions on
how to select and run a report with QualityStage.

Reports Database
The Reports database included with QualityStage Designer is a
Microsoft Access database and contains predefined reports and
associated queries.
The Reports database also contains a table called REPORT_TABLES,
which has two fields:
• ReportName: Contains the name of a report.
• TableName: Contains the name of the data table or flat file that is
queried by the corresponding ReportName.
For example, if REPORT_TABLES contains:
• A report named Standardization US Summary Report that queries
the tables (or flat files) INPUT and US020000
• A report named Match Summary Report that queries one table (or
flat file) MTCHGRP
the REPORT_TABLES table would look like this:

ReportName TableName
Standardization US Summary Report INPUT
Standardization US Summary Report US020000
Match Summary Report MTCHGRP

QualityStage uses the TableName values from REPORT_TABLES to link


your data tables to the Reports database at run time.

15-14 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Investigation Reports
For a given option you select in the Options box in the Stage
Definition Wizard, you run the appropriate report to display the
results.
For instance, if you select Investigation Character Discrete in the
Wizard, you run the Investigation Character Discrete report.
This table shows the Investigation option and the report you should
use with each option.

Investigation Option Report Name


Character Discrete Investigation Character Discrete Report
Character Discrete Investigation Character Type Report
Word Investigation Word Report

Preparing the Investigation Output File


Before you use the output from an Investigate job to create a report,
you must do the following:

1. Use the Program stage to remove the .srt extension from the
output file name.
2. Use the Format Convert stage to convert the output file to a flat
file or an ODBC file (see “Converting Output Files” on page 15-3).
3. If you converted the output file to a flat file, use the Program stage
to add the .txt extension to the file name (see “Adding a .txt
Extension to Flat Files” on page 15-3).

Investigation Character Discrete and Character Type Reports


The Investigation Character Discrete and Character Type Reports
both contain information that describes a single domain field. Choose

QualityStage Designer User Guide 15-15


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

this when the field mask for each field is All C. The report can be used
to report on multiple single domain fields that are grouped separately.

Report Table Report Field Comments


CHARDISC FIELD The field being investigated.
TOK_COUNT The number of times this exact token
appears in the file.
PERCENT The percentage of the total this
field represents.
TOKEN The actual token, when the field
mask is All C. The pattern
representation (unclassified and
classified tokens) when the field
mask is T.
EXAMPLE An example of actual data.

Investigation Word Report


This report contains information that describes the free form fields’s
pattern.

Report Table Report Field Comments


WORD TOK_COUNT The number of times this exact token
appears in the file.
PERCENT The percentage of the total this field
represents.
PATTERN The pattern representation of the field
(unclassified and classified tokens).
EXAMPLE An example of actual data.

15-16 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Country-specific Standardization Reports


These reports are available for a specific set of countries supported by
the QualityStage standardization rule sets. Each standardization
report shares identical requirements and layout characteristics.

Country Codes
The country-specific standardization report names use the country’s
two-character ISO abbreviation. These are the same abbreviations
used for the rule sets.
For example, the following countries use these abbreviations:
• US (United States)
• GB (Great Britain)
• CA (Canada)
• FR (France)
• DE (Germany)
• AU (Australia)
• IT (Italy)
• ES (Spain)
Thus the Standardization US Report is for the United States, the
Standardization GB Report is for Great Britain, and so on.
This chapter uses CC to represent the country code in the report
names.

Choosing the Correct Standardization Report


You choose the correct report for your job based on the settings you
made in QualityStage Designer as follows:
• The append option in the Option box in the Stage Definition Wizard
dialog box
• The rule sets in the Standardize Wizard — Command Definition
dialog box
For instance, you select the Standardization US Report if you used:

QualityStage Designer User Guide 15-17


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

• The USNAME, USAREA, and USADDR rule sets, and


• The No Append option.

Country-Specific Standardization Reports


This table shows the standardization report to select based on the rule
set and append option you use.

Rule Set Append Report Name


Option
NAME, AREA, & ADDR No Append Standardization CC Report
Append All Standardization CC Appended
Report
NAME, AREA, ADDR, & PREP No Append Standardization CC Report with
Prep
Append All Standardization CC Appended
Report with Prep
NAME No Append Standardization CC Name Report
Append All Standardization CC Appended Name
Report
AREA No Append Standardization CC Area Report
Append All Standardization CC Appended Area
Report
ADDR No Append Standardization CC Address Report
Append All Standardization CC Appended
Address Report
PREP No Append Standardization CC Prep Report
Append All Standardization CC Appended Prep
Report
NAME, ADDR, & AREA No Append Standardization CC Summary
Report
Append All Standardization CC Appended
Summary Report

15-18 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Rule Set Append Report Name


Option
NAME, ADDR, AREA & PREP No Append Standardization CC Summary
Report with Prep
Append All Standardization CC Appended
Summary Report with Prep

Country-Specific Standardization Attributes


Country-specific standardization reports share the following
attributes:
• Display a maximum of six data input fields. These fields typically
represent data such as: NAME, ADDRESS 1, ADDRESS 2, CITY,
STATE, ZIP.
• Contain a RECKEY, which is a unique ID for the input.
• Use preconfigured Name, Address, Area, and Prep rule set
business-intelligence fields, depending upon the report you select.
For more information about these fields, see “Domain-Specific
Dictionary Files” on page D-11.
• Use the dictionary abbreviation and the first five characters of the
rule set name (country code and the first three characters of the
rule set suffix) to name the dictionary file fields. For example:
• HNUSADD, where HN = House #, US = Country Code, and ADD =
Address Rule Set
• CNUSARE, where CN = City Name Abbreviation., US = Country
Code, and ARE = Area Rule Set

QualityStage Designer User Guide 15-19


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Standardization CC Report
This report contains the input fields and preconfigured business
intelligence fields from the NAME, ADDR, and AREA rule sets.

Report Table Report Field


INPUT RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6
CC020000 Preconfigured fields from the NAME, ADDR, and
AREA Standardization rule sets

Standardization CC Appended Report


This report contains preconfigured business intelligence fields from
the NAME, ADDR, and AREA rule sets and the input fields.

Report Table Report Field


CCA20000 Preconfigured fields from the NAME, ADDR, and
AREA Standardization rule sets
RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6

15-20 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Standardization CC Report with Prep


This report contains the input fields and preconfigured business
intelligence fields from the NAME, ADDR, AREA, and PREP rule
sets.

Report Table Report Field


INPUT RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6
CC020000 Preconfigured fields from the NAME, ADDR, and
AREA Standardization rule sets
CC0PREP0 Preconfigured fields from the PREP Standardization
rule sets

QualityStage Designer User Guide 15-21


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Standardization CC Appended Report with Prep


This report contains the input fields and preconfigured business
intelligence fields from the NAME, ADDR, AREA, and PREP rule
sets. The input is appended.

Report Table Report Field


CCA20000 Preconfigured fields from the NAME, ADDR, and
AREA Standardization rule sets
RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6
CCAPREP0 Preconfigured fields from the PREP Standardization
rule set

Standardization CC Name Report


This report contains the input fields and preconfigured business
intelligence fields from the NAME rule set.

Report Table Report Field


INPUT RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6
CC0NAME0 Preconfigured fields from the NAME Standardization
rule set

15-22 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Standardization CC Appended Name Report


This report contains the input fields and preconfigured business
intelligence fields from the NAME rule set.

Report Table Report Field


CCANAME0 Preconfigured fields from the NAME Standardization
rule set
RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6

Standardization CC Area Report


This report contains the input fields and preconfigured business
intelligence fields from the AREA rule set

Report Table Report Field


INPUT RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6
CC0AREA0 Preconfigured fields from the AREA Standardization
rule set

QualityStage Designer User Guide 15-23


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Standardization CC Appended Area Report


This report contains the input fields and preconfigured business
intelligence fields from the AREA rule set.

Report Table Report Field


CCAAREA0 Preconfigured fields from the AREA Standardization
rule set
RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6

Standardization CC Address Report


This report contains the input fields and preconfigured business
intelligence fields from the ADDR rule set.

Report Table Report Field


INPUT RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6
CC0ADDR0 Preconfigured fields from the ADDR Standardization
rule set

15-24 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Standardization CC Appended Address Report


This report contains the input fields and preconfigured business
intelligence fields from the ADDR rule set.

Report Table Report Field


CCAADDR0 Preconfigured fields from the ADDR Standardization
rule set
RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6

Standardization CC Prep Report


This report contains the input fields and preconfigured business
intelligence fields from the PREP rule set.

Report Table Report Field


INPUT RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6
CC0PREP0 Preconfigured fields from the PREP Standardization
rule set

QualityStage Designer User Guide 15-25


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Standardization CC Appended Prep Report


This report contains the input fields and preconfigured business
intelligence fields from the PREP rule set.

Report Table Report Field


CCAPREP0 Preconfigured fields from the PREP Standardization
rule set
RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6

Standardization CC Summary Report


This report contains summary information from the NAME, ADDR,
and AREA rule sets. This information includes the number of records:
• Processed.
• Fully processed.
• With unhandled data.
• That have no address, area, or name information.
• With additional name and address information.
• Broken down by address type. (For example, address type would be
street or box.).
This report uses the INPUT and CC020000 tables

Standardization CC Appended Summary Report


This report contains the original input fields and the summary
information from the NAME, ADDR, and AREA rule sets. The
information includes the number of records:

15-26 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

• Processed.
• Fully processed.
• With unhandled data.
• That have no address, area or name information.
• With additional name and address information.
• Broken down by address type. (For example, address type would be
street or box.).
This report uses the CCA20000 table.

Standardization CC Summary Report with Prep


This report contains summary information from the NAME, ADDR,
AREA, and PREP rule sets.This information includes the number of
records:
• Processed.
• Fully processed.
• With unhandled data.
• That have no address, area or name information.
• With additional name and address information.
• Broken down by address type. (For example, address type would be
street or box.).
This report uses the INPUT, CC020000, and CC0PREP0 tables.

Standardization CC Appended Summary Report with Prep


This report contains summary information including the total number
of records.
• Processed.
• Fully processed.
• With unhandled data.
• That have no address, area, name, or other information.
• With additional name and address information.

QualityStage Designer User Guide 15-27


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

• Broken down by address type


This report uses the CCA20000 and CCAPREP0 tables.

Standardization WAVES/Multinational Report


This report is used when you run a Multinational Standardize job or a
QualityStage WAVES stage. The report contains preconfigured
business intelligence fields from the QualityStage WAVES rule set.
The FIELDS in the report are input fields.

Report Table Report Field


MNVWAVE0 Preconfigured name information from the WAVES
rule set
INPUT RECKEY
FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6

Standardization WAVES/Multinational Appended Report


This report is used when you run a Multinational Standardize job or a
QualityStage WAVES stage. The report contains the original input
fields and preconfigured business intelligence fields from the
QualityStage WAVES rule set. The FIELDS in the report are input
fields.

Report Table Report Field


MNVWAVEA Preconfigured name information from the WAVES
rule set
RECKEY

15-28 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

FIELD1
FIELD2
FIELD3
FIELD4
FIELD5
FIELD6

Matching Reports
The following reports are used with the output from the following
Match jobs as defined by the match option you select in the Stage
Definition Wizard:

Match Option Report


All except Undup and Undup Independent Match Grouping Summary
All except Undup and Undup Independent Match Histogram
All except Undup and Undup Independent Match Output Review Report
All except Undup and Undup Independent Match Summary Report
• Undup Match Unduplication
• Undup Independent Summary Report

QualityStage Designer User Guide 15-29


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Match Fields
The Match reports use the MTCHGRP table. The MTCHGRP table
contains the following fields:

Report Field Comments


RECKEY Unique record ID.
SET_NUM Match set number denoted by a number (1,2,3, etc.).
The match set numbers group together the record
QualityStage determines have match characteristics.
MATCH_TYPE Values: XA, MA, MB, CA, CB, RA, RB, DA, DB
PASS_NUM The number of the match pass.
WEIGHT The degree of probability of each record as compared
to the master record.
FIELD1– FIELD6 Input data fields

Note: The Match extract you use for Match reports must contain all the
fields listed in the preceding Match Fields table.

Match Grouping Summary Report


This report contains the mean total weight of matched records, the
standard deviation of the total weight of matched records, and the
number of groups with differing numbers of members.

Match Histogram Report


This report contains a histogram that compares the number of
matches to the weight of the matches.

Match Output Review Report


This report is grouped by set number, pass, type and weight to give an
ordered view of the match results for an unduplicate run.

15-30 QualityStage Designer User Guide


WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

Match Summary Report


This report provides the number of matches, clericals, duplicates and
residuals from the A and B files, as well as the total number of
records, the total number of unique records, and the minimum and
maximum match rates.
The fields contain the set identifier, a two letter abbreviation for the
record type, inclusive of the following MA (match from file A), MB
(match from file B), CA (clerical from file A), CB (clerical from file B),
DA (duplicate from file A), DB (duplicate from file B), RA (residuals
from file A), RB (residual from file B), the match pass number, and the
weight. The remaining three fields can be used for other data from the
extract file.
The match output types XA or MP will not match correctly with this
report.

Match Unduplication Summary Report


This report provides the number of matches, clerical, duplicates and
residual from the A file, as well as the total number of records, the
total number of unique records, and the minimum and maximum
match rates. This report should be used for an unduplication run. The
Match Summary Report should be used for a two file match.
The fields contains the set identifier, a two letter abbreviation for the
record type, inclusive in the following XA (match from file A), CA
(clerical from file A), DA (duplicates from file A), RA (residuals from file
A), the match pass number, the weight, and the remaining three fields
can be used for other data from the extract file.

Survivorship Report
The survivorship report provides before and after information for each
group of records, including the surviving record and any matching
records.

QualityStage Designer User Guide 15-31


15 WORKING WITH QUALITYSTAGE REPORTS
About Predefined QualityStage Reports

This report uses the SURVIN or the SURVOUT table.

Report Field Description


MTCH_TYPE Values can be:
XA Match from file A
MA Match from file A
MB Match from file B
CA Clerical from file A
CB Clerical from file B
RA Residual from file A
RB Residual from file B
DA Duplicate from file A
DB Duplicate from file B
SET_NUM Match set number denoted by a number (1,2,3, etc.).
The match set numbers group together the record
QualityStage determines have match characteristics.
NAME1 – NAME6 Input (SURVIN) or output (SURVOUT) fields that
contain name information.
ADDR1 – ADDR6 Input (SURVIN) or output (SURVOUT) fields that
contain address information.
AREA1 – AREA6 Input (SURVIN) or output (SURVOUT) fields that
contain area information.

15-32 QualityStage Designer User Guide


16

Using the QualityStage Data File


and Report Viewer

Input and output data files, and Match reports and extracts, can be
very large (on the order of gigabytes) and cannot be viewed easily with
traditional text editors such as vi or WordPad.
QualityStage includes a Report Data File and Report Viewer, which
lets you easily view and analyze these files used in stages without
having to switch to another application.
You view data files, reports, and extracts of a specific job. First you
select the job, and then you select the data file, report, or extract to
view.

Important: Be aware of the following:


• The Report Viewer requires INTEGRITY server version 3.8 or later
on the host machine. The host machine must also be configured to
accept FTP requests.
• The Report Viewer is unavailable for OS/390 hosts. If you select a
Stage that runs on an OS/390 host, QualityStage automatically
disables the Report Viewer.

QualityStage Designer User Guide 16-1


16 USING THE QUALITYSTAGE DATA FILE AND REPORT VIEWER
Selecting the Job

Selecting the Job


1. On the left pane of the QualityStage main window, select a Jobs
folder.
2. On the right pane, right-click the job whose files you want to view.
3. Click Server Reports & Datafiles to access the Choose Datafile or
Report to View dialog box.

Choosing the File to View


The Choose Datafile or Report to View dialog box lists all input
datafiles, output datafiles, report files, and extract files associated
with the selected job.

Select the file you want to view and click View File to display the
selected file in the Report Viewer screen.

16-2 QualityStage Designer User Guide


USING THE QUALITYSTAGE DATA FILE AND REPORT VIEWER
Using the QualityStage Data File and Report Viewer

Tip: Double-click a file row to display the file.

To cancel viewing files, click Exit.

Using the QualityStage Data File and Report Viewer


When you initially access the Report Viewer, the first 27 lines of the
file appear in the box. Use the horizontal scroll bar as needed to view
more data.

Note: You cannot edit the file from this screen. You must use an
external text editor for editing.

After you are finish viewing the file, click Exit to return to the Choose
Datafile or Report to View dialog box.

QualityStage Designer User Guide 16-3


16 USING THE QUALITYSTAGE DATA FILE AND REPORT VIEWER
Using the QualityStage Data File and Report Viewer

Navigating
Use the controls at the bottom left of the screen to navigate through
the file.

As you scroll through the file, the total lines in the file and the current
line numbers being viewed are noted at the top. The end of the file is
indicated with an [EOF] line.

To... Click... or Press...


View previous 27 lines Page Up PgUp
View next 27 lines Page Down PgDn
Scroll up 1 line Line Up – or _
Scroll down 1 line Line Down = or +

To jump to a specific line:

1. Enter the line number in the Go To A Line text box.


2. Click Go to Line # or press Enter/Return. The specified line
number appears as the top line.

Note: If you entered a number greater than the number of lines in the
file, the last 27 lines of the file are shown.

16-4 QualityStage Designer User Guide


USING THE QUALITYSTAGE DATA FILE AND REPORT VIEWER
Using the QualityStage Data File and Report Viewer

Searching
The Report Viewer also lets you search for a specific word in the file,
starting from the line that is currently at the top of the screen.

To jump to the nearest page that contains your specified text:

1. Enter the text in the Find Word text box.


2. (Optional) Change the search criteria:
• To search in the opposite direction, select the Search
Downwards or Search Upwards option accordingly.
• To perform a case-sensitive search, toggle the Case-Sensitive
Search option.
3. Click Find Word or press Enter/Return to start the search. The
Find Word button becomes a Cancel button; all other buttons and
options are temporarily disabled.
If the current file is very large, a progress message appears, where x is
the line the Report Viewer is currently searching, and y is the total
number of lines in the file.
Searching Line x of y
Click Cancel to abort the search and return the view to its previous
state.
If the Report Viewer finds the specified words not on the current page,
that line becomes the new top line of the screen. If the line is within
the last 27 lines of the file, the last 27 lines are shown.
If the Report Viewer cannot find the specified words, a message box
appears.

QualityStage Designer User Guide 16-5


16 USING THE QUALITYSTAGE DATA FILE AND REPORT VIEWER
Troubleshooting the Data File and Report Viewer

Troubleshooting the Data File and Report Viewer


A variety of errors can occur when trying to access the Report Viewer.
Here are possible causes and solutions.

Problem Solution
The Report Viewer behaves The Report Viewer is designed to work with
unpredictably, either entirely files that have a fixed line length. Each line
missing a report or printing must be the same size. This problem is
out only parts of others. caused because the server uses a constant
address lookup scheme. Pad your file with
spaces to make it fixed length. (Utilities
will soon be coming to automate this
process.)
The Server Reports and The stage is being run on an OS390 host.
Datafiles button is grayed out The Report Viewer is currently unavailable
when I select a stage with the for OS390.
file I want to view.
The Report Viewer comes up You may not have an FTP daemon running
with an error message on the host machine. Make sure FTP is
complaining about FTP. running.
Connection is aborted due to One of two possible problems:
timeout or other failure. • Your host is being loaded down by other
programs running on the host machine.
Remove the load.
• You are running a version of the
INTEGRITY server that is older than
version 3.8. Get version 3.8 (or later) of
the server and retry.
Connection is forcefully You do not have a server running on the
rejected. host machine. Get a 3.8 server version and
retry.

16-6 QualityStage Designer User Guide


A

Importing Projects from MVS and


UNIX into QualityStage Designer

QualityStage provides the ability to migrate your QualityStage project


definitions from an MVS or UNIX system to QualityStage Designer.
To do this, you create an Interchange Metadata Format (IMF) file of
the definitions that you want to migrate.
To create your project, you need either a UNIX or a Windows server
running INTEGRITY version 3.0 or later.
After you create the IMF file, you use the QualityStage Designer to
import the project. For information on importing projects, see
“Importing Projects” on page 4-8. All warning messages are logged to a
file stored in your system’s temporary directory (usually C:\Temp) at
the end of the import.
This appendix describes the following procedures for migrating a
project:
• Preparing the project for migration
• Creating the IMF file

QualityStage Designer User Guide A-1


A IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Preparing Your Project

Preparing Your Project


You need to perform two steps in preparing your project:
• Collect the information from the UNIX or MVS system.
• Update the information for importing the project cleanly.

Collecting Information from the UNIX or MVS System


To import a project into QualityStage Designer, you need to collect the
following information:
• All job files, as either JCL procs or ARDs
• All control member files
• All data file definitions (the data repository)

For UNIX
If you are migrating from UNIX, you only need to create a list of the
full pathname for the job files and the name of the directories
containing the control members.

For MVS
If you are migrating from MVS, you need to do more work. The MVS
runtime provides a command, UNIXPORT, that steps through one
individual job within an application and produces a PDS for the JCL
procs and a PDS for the control members, including the Unijoin source
code (Save Language source). The outputs from this command are:
• GETLIST
• UNIX.SLIB
• UNIX.CLIB
To use the UNIXPORT command:
1. Using the ISPF panels, access the application you want to
migrate.

A-2 QualityStage Designer User Guide


IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Preparing Your Project

2. Go to the Edit JCL LIB (option 5.2).


3. Select the next job in your project.
4. On the command line of the Edit panel, enter UNIXPORT.
A panel that provides the status of the port appears. When the
port is complete, the GETLIST, which lists all of the control
members and all data sets referenced in the JCL, appears.
5. Select the second job in your project and repeat step 4.
6. Repeat step 5 for all jobs in the project being ported.

Important: You must select the jobs in the order in which the application
runs.

Transferring the PDS Contents


After you have prepared the project, you can transfer the contents of
the UNIX.SLIB PDS and the UNIX.CLIB PDS to your UNIX or Windows
server.
If you have defined the metadata for your data files (input, output,
and reference) in the data repository, you also want to transfer your
data repository. If you did not define the metadata for the data files,
you might want to do so, using the data repository, before you create
the IMF file. The conversion process uses the job files, the control files,
and the data definition files to produce data file names and field
information, including the name, starting position, length, and
description.
If you do not have data definitions for your files, the conversion
process creates definitions. Fields receive names beginning with a
unique alphabetic identifier followed by the starting position and the
length of the field. The description for the data file and each field is:
Created during conversion.

Updating the Project


Before you create the IMF file, you should examine your jobs to make
sure that they conform to the INTEGRITY Designer for version 3.0 or

QualityStage Designer User Guide A-3


A IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Preparing Your Project

later. The QualityStage server provides some stage options that are
currently not supported in the QualityStage Designer interface. These
options are:
• Transfer
– Skip Count
– Fill
– Fill Character
• Parse
– Capitalization Mode, such as CAPSON and CAPSOF
• Select
– Output Mode = TAB
– Select Key
– Convert Key
– Output Skip
– Output Length
• Unijoin
– Output Mode = NONE
– Interval equate fields
– The following EQUATE comparison codes:
Field Numeric (FN) Field Signed (FS)

Field Floating (FF) Reference Interval (RI)

Reference Parity Interval Reference Signed Interval


(RP) (RS)
Reference Floating Interval Data Interval (DI)
(RF)

Data Parity (DP) Data Signed Interval (DS)

Data Floating Interval (DF)

If you have operations using these options, you need to review their
control files and determine if you need to edit and remove or change
these options.
A-4 QualityStage Designer User Guide
IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Creating the IMF File

QualityStage Designer permits only one input data file to any stage,
except the Unijoin stage. If you have a stage, such as the Sort stage,
with two input files, you might want to create two separate Sort
stages, one for each input data file. Otherwise, when you create the
IMF file, only one input file will be used for the stage.

Tip: If you are using the Sort stage to concatenate files, select Append
to File when you select the output file.

Creating the IMF File


Use the jcl_cnv command to convert your job, control, and data files
into an IMF file. This command allows you to specify three input files
that list the location and name of all job, control, and data files.
These input files have specific formats to which you must adhere. This
section describes the input file formats, the syntax of the jcl_cnv
command, conversion issues, and the types of messages reported.

Input File Formats


The three input files are:
• Job list file
• Control list file
• Data definition list file

QualityStage Designer User Guide A-5


A IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Creating the IMF File

Job List File


The job list file contains a list of all jobs to be converted to IMF. Each
line of this file must use the following format:
field1 field2 [field3]

Field Description
field1 Contains the full pathname of each job file to be converted.
field2 Contains the keyword JCL or ARD indicating what type of
job.
field3 Contains a default job name for the operations in the JCL or
ARD file if a job name is not found. Optional argument.

An example of a job list file is:


/home/dave/data/SH5RETAG JCL Proc1
/home/dave/data/PR1SHIPT JCL Proc5

Control List File


The control list file provides the location of the control member files.
Each line of the file must contain the full path for the directory in
which control member files are located.
An example of a control list file is:
/home/dave/project1/control
/home/dave
The order of the entries in this file determines the order that jcl_cnv
searches directories. If two control files with the same name exist in
two different directories, the control file in the directory listed first is
used.

Note: If the conversion cannot locate any control files for an operation,
the conversion continues, but the IMF file will not contain any
operation specifications, only data file information. You will have
to define each stage in the job.

A-6 QualityStage Designer User Guide


IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Creating the IMF File

Data Definition List File


The data definition file lists all files containing the metadata for the
project data files, such as the data repository. Each line of this file
must contain the full path of a data definition file.
Examples of file entries:
/home/dave/data/DDFILE1
/home/dave/data/DDFILE2
If you do not include a data definition file or definitions for data fields,
the conversion creates the data file and field definition from the job
files and control member files and constructs and assigns unique field
names for each missing data file definition.

Using the jcl_cnv Command


The jcl_cnv command is provided with the QualityStage server running
on either a UNIX or Windows server. Use this command to consolidate
and convert all project component files into one IMF file. You can
convert an entire project or a single job file with this command.
The format for jcl_cnv is:
jcl_cnv {-p proc_list | -i jcl_file | -m ard_file} -E
imf_filename [-d datadef_list] [-c control_list] [-n
project_name] [-C check_log] [-v]

Argument Description
–p proc_list (Required) The full path of the job list file.
–i jcl_file The full path of a single JCL job file. Used in place
of -p proc_list.
–m ard_file The full path of a single ARD job file. Used in place
of -p proc_list.
–E imf_filename (Required) The full path for the output IMF file.
–d datadef_list (Optional) The full path of the data definition list
file.

QualityStage Designer User Guide A-7


A IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Creating the IMF File

Argument Description
–c control_list (Optional) The full pathname of the control list file.
If not specified, jcl_cnv looks in the current working
directory for control member files.
–n project_name (Optional) A default project name if no project name
is found within the job files.
–C check_log (Optional) Logs as a warning any occurrence of a
filename greater than eight characters.
–v (Optional) Turns on verbose-mode messages for
locating operations with record length conflicts.

Conversion Issues
This section describes some issues around data file definitions and the
Sort stage you might encounter after the conversion.

Converting Data File Definitions


The conversion process takes the file record length and field
information from the data definition files. If you do not have data
definition files, the conversion process uses the job files and the
control files to define the data files.
You might observe the following conditions, which can cause problems
when running the imported project:
• Incomplete data definition.
If the data definition is incomplete or missing, some file record
lengths might not be found. You will receive warning messages
from jcl_cnv stating that the record length for the file could not be
determined.
• Multiple record length definitions.
If multiple record lengths are found for the same file, you will
receive warning messages from jcl_cnv indicating the number of

A-8 QualityStage Designer User Guide


IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Creating the IMF File

different record length definitions for the file. (The –v argument


provides detailed warning messages.)
• Fields beyond defined record length.
If fields extend beyond the defined record length for the file, you
will receive a warning message from jcl_cnv. QualityStage
Designer calculates the record length based on the maximum
extent of the fields defined for a file. That is, QualityStage
Designer does not use the record length defined in the control
files.
• Record length exceeds field definitions.
If the data definition indicates a record length longer than any
defined fields, jcl_cnv adds a field definition that starts at position
one for the length of the defined record length to maintain the
stated record length for the file. For example, if the record length
is 80 and the last field starts at 17 for 22, jcl_cnv adds a field that
starts at 1 for 80 characters to define the entire record length.

Important: Always check the file and field definitions using the Data File
Wizard and Data Field Wizard of the QualityStage Designer.
Update any definitions, if necessary, before running the project.

Sorts Operations
Sort stages that precede a Build, a Unijoin, or a Collapse stage are
removed from the job if the Sort stage only sorts the fields used by the
stages and the input and output files are the same file. QualityStage
Designer automatically performs a Sort operation before a Build, a
Collapse, or a Unijoin stage.
Although these Sort operations are not listed as operations in the job,
they are added to the ARD when the job is compiled. Leaving in the
Sort operations from the original job would result in duplicate Sort
operations, which increases processing time.
If you have Sort operations on input files (and reference files) to the
Build, Collapse, and Unijoin stages that sort other fields or additional
fields, jcl_cnv issues a warning message indicating the number of such
sorts.

QualityStage Designer User Guide A-9


A IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Creating the IMF File

If you use the –v argument, you receive a more detailed warning


message. This message indicates the stage and the step that might be
impacted by the Sort operation automatically generated by
QualityStage Designer. You need to review these operations and make
appropriate changes to achieve the desired effect when running the
job.

Limited Operation Specifications


The following jobs and stages are not yet fully supported with jcl_cnv:
• Investigate
• Match
• Standardize
• Survive
• Rollup
The support is currently limited to defining files and operational step
sequence within jobs.

Note: QualityStage Designer does not support Rollup.

Understanding Conversion Problems


When encountering errors in the project files, jcl_cnv issues warnings
or fatal error messages. This section describes the two types of error.

Warnings Messages
Warning messages typically indicate invalid specifications in control
files or missing file or field information. The jcl_cnv program continues
converting the project files and creates an IMF file if only
warning-level errors are encountered.
You should review all warning messages to determine if specifications
need to be adjusted or file definitions need to be updated before
importing or running the project using QualityStage Designer. If you

A-10 QualityStage Designer User Guide


IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Creating the IMF File

are able, correct the cause of the warning messages with the input file
and rerun jcl_cnv to create a complete and accurate IMF of the project.

Fatal Error Messages


Fatal error messages indicate errors, such as incomplete operation
specifications or out of memory conditions, have occurred preventing
jcl_cnv from completely converting the project files. If jcl_cnv
encounters serious errors, it issues messages and continues converting
as much of the project as possible.

The resulting IMF file is incomplete. However, jcl_cnv generates two


additional files:
• imf.op.tmp, which contains a portion of the generated operation
table.
• imf.sc.tmp, which contains a portion of the generated job table.
These files are located in the directory that you specified with the
–E imf_filename argument. You can use these files to determine
where in the jobs and operations the fatal errors occurred. You should
delete these files before rerunning jcl_cnv after resolving the problems.

Log Files
If you receive a non-fatal error message, it may refer you to a log file.
The log file is stored in your system’s temporary directory (usually
C:/TEMP).
The log file is named IBTxxxxxx.LOG. The x represents the date the log
file was created. For example, a file named:
IBT010319.LOG
indicates that the log file was created on March 19, 2001.
To find the log file on Windows:
1. Open the Control Panel.
2. Click System.

QualityStage Designer User Guide A-11


A IMPORTING PROJECTS FROM MVS AND UNIX INTO QUALITYSTAGE DESIGNER
Creating the IMF File

3. On the Advanced or Environment tab, look for the environment


variable definitions. The log file is located in the directory listed as
the path for the TEMP or TMP variable.

Things to Check After Import


After completing the import process, check the following:
• Unijoin Groups may need to be redefined in QualityStage
Designer.
• Unijoin Save Language is imported as text; you should check to
make sure that any fields referenced in the Save Language are
defined in the data file. If not, you should add those fields to your
data file definition and redeploy the stage.

Note: QualityStage Designer uses the 0 suffix to indicate fields in the


output file.

A-12 QualityStage Designer User Guide


B

Match Comparisons

This appendix provides a detailed description of the Match


comparisons that you specify in the Match Pass dialog box.

Note: For Undup runs, there is no FileB; matches are done among the
records on FileA.

ABS_DIFF Comparison
The ABS_DIFF comparison is an absolute difference comparison that
compares two numbers, such as age, and assigns the full weight if the
numbers agree. If numbers do not agree, the full disagreement weight
is assigned. You can use this comparison with arrays.
You must specify the following two fields:

Field Description
varA The number from FileA
varB The number from FileB

QualityStage Designer User Guide B-1


B MATCH COMPARISONS
AN_DINT Comparison

The ABS_DIFF comparison requires the following parameter:

Parameter Description
Param 1 The absolute value of the difference in the values of the
fields.

For example, you are comparing ages and specify the 5 for Param 1. If
the ages being compared are 24 and 26, the absolute difference is 2
and the full agreement weight is assigned. If the ages are 45 and 52,
the absolute difference is 7 and the full disagreement weight is
assigned.

AN_DINT Comparison
The AN_DINT comparison is a double alphanumeric left/right interval
comparison that compares house numbers in Census Bureau Tiger
files, the Etak files, the GDT DynaMap files, or the U.S. Post Office
ZIP+4 files. A single house number, which might contain alpha
characters, is compared to two intervals.
One interval represents the left side of the street and the other
represents the right side of the street; for example, 123A to the
intervals 101-199 and 100-198. For a number to match to an interval,
both the parity (odd/even) and the range must agree. This comparison
causes a special flag to be set to indicate whether the left or the right
interval matched.
You specify the following five fields:

Field Description
varA The number on FileA
varB1 The beginning of the interval range for one side of the street
(such as from left) from FileB

B-2 QualityStage Designer User Guide


MATCH COMPARISONS
AN_INT Comparison

Field Description
varB2 The ending of the interval range for one side of the street
(such as to left) from FileB
varB3 The beginning of the interval range for the other side of the
street (such as from right) from FileB
varB4 The ending of the interval range for the other side of the
street (such as to right) from FileB

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number.
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values.

The beginning number of an interval can be higher than the ending


number and still match; that is, the files can have a high address in
the FROM field and a low address in the TO field. For example, 153
matches both the range 200-100 and the range 100-200.

AN_INT Comparison
The AN_INT comparison is an alphanumeric odd/even interval
comparison that compares a single number on FileA to an interval or
range of numbers on FileB. These numbers can contain alphanumeric
suffixes or prefixes. The number must agree in parity with the low
range of the interval. For example, an interval such as 123A to 123C is
valid and contains the numbers 123A, 123B and 123C.
A single number on FileA is compared to an interval on FileB. If the
number on FileA is odd, the begin range number on FileB must also be
odd to be considered a match. Similarly, if the number on FileA is
even, the begin range on FileB must be even to be considered a match.

QualityStage Designer User Guide B-3


B MATCH COMPARISONS
CHAR Comparison

Interval comparison types are primarily used for geocoding


applications, such as postal address matching in which you are
matching 123A Main St to the range 121 to 123C Main St. In these
cases, FileB is the reference file. The single number on File A must be
within the interval, including the end points, to be considered a
match.
You specify the following three fields:

Field Description
varA The number on FileA
varB1 The beginning of the interval range from FileB
varB2 The ending of the interval range from FileB

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number.
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values.

The beginning number of the interval can be higher than the ending
number and still match; that is, the files can have a high address in
the FROM field and a low address in the TO field. For example, 153
matches both the range 200-100 and the range 100-200.

CHAR Comparison
The CHAR comparison is a character-by-character comparison. If one
field is shorter than the other, the shorter field is padded with trailing
blanks to match the length of the longer field. Any mismatched
character causes the disagreement weight to be assigned. You can use
the CHAR comparison with arrays and reverse matching.

B-4 QualityStage Designer User Guide


MATCH COMPARISONS
CNT_DIFF Comparison

You specify the following two fields:

Field Description
varA The character string from FileA
varB The character string from FileB

CNT_DIFF Comparison
The CNT_DIFF comparison counts keying errors in numeric fields,
such as dates, telephone numbers, file or record numbers, and Social
Security numbers. For example, you have the following birth dates
appearing on both files, and you suspect that these represent the same
birth date with a data entry error on the month (03 vs. 08):
19670301
19670801
You can use this comparison with arrays and reverse matching.
You specify the following two fields:

Field Description
varA The number from FileA
varB The number from FileB

The CNT_DIFF comparison requires the following parameter:

Parameter Description
Param 1 Indicates how many keying errors will be tolerated.

If you specify a 1, no errors result in the agreement weight being


assigned. One error results in assigning the agreement weight 1/2
(agreement weight + disagreement weight). Two or more errors result
in the disagreement weight. The disagreement weight is always a
negative number. Thus, one error would yield a partial weight.

QualityStage Designer User Guide B-5


B MATCH COMPARISONS
D_INT Comparison

If you specify 2, the errors are divided into thirds. One error results in
assigning the agreement weight minus 1/3 the weight range from
agreement to disagreement. Two errors would receive the agreement
weight minus 2/3 the weight range, and so on. Thus, the weights are
prorated according to the seriousness of the disagreement.

D_INT Comparison
The D_INT comparison is a left/right interval comparison that
compares house numbers in Census Bureau Tiger files, the Etak files,
or the GDT DynaMap files. A single house number is compared to two
intervals. One interval represents the left side of the street and the
other represents the right side of the street. For a number to match to
an interval, both the parity (odd/even) and the range must agree.
You specify the following five field names:

Field Description
varA The number on FileA
varB1 The beginning range of the interval for one side of the
street (such as from left) from FileB
varB2 The ending range of the interval for one side of the
street (such as to left) from FileB
varB3 The beginning range of the interval for the other side
of the street (such as from right) from FileB
varB4 The ending range of the interval for the other side of
the street (such as to right) from FileB

B-6 QualityStage Designer User Guide


MATCH COMPARISONS
D_USPS Comparison

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number.
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values.

D_USPS Comparison
The D_USPS comparison is a left/right USPS interval comparison
that processes United States Postal Service (USPS) ZIP+4 files or
other files that might contain non-numeric address ranges. The
D_USPS comparison requires the field names for the house number
(generally on FileA), two intervals for house number ranges on FileB,
and control fields that indicate the parity of the house number range.
You specify the following seven fields:

Field Description
varA The number from FileA
varB1 The beginning range of the interval for one side of the
street (such as from left) from FileB
varB2 The ending range of the interval for one side of the
street (such as from left) from FileB
varB3 The beginning range of the interval for the other side
of the street (such as from right) from FileB
varB4 The ending range of the interval for the other side of
the street (such as to right) from FileB
Bcontrol1 The odd/even parity for the range defined with varB1
and varB2
Bcontrol2 The odd/even parity for the range defined with varB3
and varB4

QualityStage Designer User Guide B-7


B MATCH COMPARISONS
DATE8 Comparison

The control information, from the USPS ZIP+4 file, is:

Control Description
O The range represents only odd house numbers.
E The range represents only even house numbers
B The range represents all numbers (both odd and. even)
in the interval.
U The parity of the range is unknown.

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number.
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values.

A house number on FileA is first compared to the interval range


defined with varB1 and varB2. If the parity of house number agrees
with the code defined with Bcontrol1 and with the parity of the house
number defined with varB1, and the intervals overlap, it is considered
a match. If not, the house number on FileA is next compared to the
interval defined with varB3 and varB4.

DATE8 Comparison
The DATE8 comparison allows tolerances in dates, taking into
account the number of days in a month and leap years. The supported
date format is yyyymmdd.

B-8 QualityStage Designer User Guide


MATCH COMPARISONS
DATE8 Comparison

You can use this comparison with arrays and reverse matching. You
specify the following two fields:

Field Description
varA The date from FileA
varB The date from FileB

The DATE8 comparison requires at least one and can use two
parameters:

Parameter Description
Param 1 The number of days difference that can be tolerated. If
you only specify Param 1, this is the number of days
that can be tolerated for either varB greater than varA
or varA greater than varB.
If you specified both parameters, Param 1 is the
number of days tolerated for varB greater than varA.
Param 2 The number of days difference that can be tolerated
when varB is less than varA.

For example, you are matching on birth date and specified a 1 for
Param 1. If the birth dates differ by one day, the weight is the
agreement weight minus 1/2 of the weight range from agreement to
disagreement.
Two or more days difference results in a disagreement weight.
Similarly, if the value were 2, one day difference reduces the
agreement weight by 1/3 of the weight range and two days by 2/3.
An example is matching highway crashes to hospital admissions. A
hospital admission cannot occur before the accident date to be related
to the accident. You might specify a 1 for Param 1, which allows the
admission date to be one day later (greater) than the crash date, and a
0 for Param 2, which does not allow an admission date earlier than the
crash date.

QualityStage Designer User Guide B-9


B MATCH COMPARISONS
DELTA_PERCENT Comparison

DELTA_PERCENT Comparison
The DELTA_PERCENT comparison compares fields in which the
difference is measured in percentage, such as 10% difference in ages.
For example, a one year difference for an 85 year-old is less significant
than for a 3 year-old, but a 10% difference for each is more
meaningful. You can use this comparison with arrays and reverse
matching.
You specify the following two fields:

Field Description
varA The value from FileA
varB The value from FileB

The DELTA_PERCENT comparison requires at least one and can use


two parameters:

Parameter Description
Param 1 The percentage difference that can be tolerated. If you
only specify Param 1, this is the percentage that can
be tolerated for either varB greater than varA or varA
greater than varB.
If you specified both parameters, Param 1 is the
percentage tolerated for varB greater than varA.
Param 2 The maximum percentage difference that can be
tolerated when the value from varB is less than varA.

For example, you are comparing age in two files. If you want tolerance
of a ten percent difference in the values, specify 10 for Param 1. A one
percent difference subtracts 1/11 of the weight range (the difference
between the agreement and disagreement weight) from the agreement
weight. A 10 percent difference subtracts 10/11 of the difference in the
weight range.
You would specify Param 2 = 5 if you want a five percent tolerance
when varB is less than varA.

B-10 QualityStage Designer User Guide


MATCH COMPARISONS
DISTANCE Comparison

DISTANCE Comparison
The DISTANCE comparison computes the Pythagorean distance
between two points and prorates the weight on the basis of the
distance between the points. You can use this comparison for
matching geographic coordinates where the farther the points are
from each other, the less weight is applied.
You specify the following four fields:

Field Description
varA1 The X coordinate from FileA
varA2 The Y coordinate from FileA
varB1 The X coordinate from FileB
varB2 The Y coordinate from FileB

The DISTANCE comparison requires the following parameter:

Parameter Description
Param 1 The maximum distance to be tolerated.

The distance is in the units of the coordinates. Coordinates must be


positive or negative integers; decimal places are not permitted. For
example, if the coordinates are in thousandths of a degree, a
maximum distance of 100 tolerates a distance of 0.1 degrees.
If the distance between the points is zero, the agreement weight is
assigned. If the distance is 0.05 degrees, the midpoint between the
agreement and disagreement weight is assigned. If the distance is
greater than 0.1 degree, the disagreement weight is assigned.
Frequency analysis is not run on distance values.

QualityStage Designer User Guide B-11


B MATCH COMPARISONS
INT_TO_INT Comparison

INT_TO_INT Comparison
The INT_TO_INT comparison matches if an interval on one file
overlaps or is fully contained in an interval in another file. You could
use this comparison type for comparing hospital admission dates to
see if hospital stays are partially concurrent, or for matching two
geographic reference files containing ranges of addresses. You can use
this comparison with arrays and reverse matching.
You specify the following four fields:

Field Description
varA1 The beginning range of the interval from FileA
varA2 The ending range of the interval from FileA
varB1 The beginning range of the interval from FileB
varB2 The ending range of the interval from FileB

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number.
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values.

The following example illustrates interval-to-interval comparisons:


Interval from FileA is:
19931023 19931031

B-12 QualityStage Designer User Guide


MATCH COMPARISONS
INTERVAL_NOPAR Comparison

Interval from FileB is:


19931025 19931102 Matches because 19931031 falls
within the interval on FileB
19930901 19931225 Matches because the interval
from FileA falls within the
interval on FileB
19930920 19931025 Matches because 19931023 falls
within the interval on FileB
19931030 19940123 Matches because 19931031 falls
within the interval on FileB
19930901 19930922 Does not match because the
interval from FileA does not
overlap the interval on FileB

INTERVAL_NOPAR Comparison
The INTERVAL_NOPAR comparison is an interval noparity
comparison that compares a single number on FileA to an interval
(range of numbers) on FileB. Interval comparisons are primarily used
for geocoding applications, where FileB is the reference file. The single
number must be within the interval (including the end points) to be
considered a match. Otherwise, it is a disagreement.
You specify the following three fields:

Field Description
varA The number from FileA
varB1 The beginning of the range of the interval from FileB
varB2 The ending of the range of the interval from FileB

QualityStage Designer User Guide B-13


B MATCH COMPARISONS
INTERVAL_PARITY Comparison

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values

The begin number of the intervals can be higher than the end number
and still match; that is, the files can have a high address in the
beginning range field and a low address in the ending range field. For
example, 153 matches both the range 200-100 and the range 100-200.

INTERVAL_PARITY Comparison
The INTERVAL_PARITY comparison is an odd/even interval
comparison that is identical to the INTERVAL_NOPAR comparison,
except that the number must agree in parity with the parity of the low
range of the interval. A single number on FileA is compared to an
interval on FileB. If the number on FileA is odd, the begin range
number on FileB must also be odd to be considered a match. Similarly,
if the number on FileA is even, the begin range on FileB must be even
to be considered a match.
You specify the following three fields:

Field Description
varA The number from FileA
varB1 The beginning range of the interval from FileB
varB2 The ending range of the interval from FileB

B-14 QualityStage Designer User Guide


MATCH COMPARISONS
LR_CHAR Comparison

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values

This type of comparison is used primarily for geocoding applications in


comparing a house number on FileA to a range of addresses on FileB.
Reference files such as the ZIP+4 files have a single odd or even
interval.
The begin number of the intervals can be higher than the end number
and still match; that is, the files can have a high address in the
beginning range field and a low address in the ending range field. For
example, 153 matches both the range 199-101 and the range 101-199.

LR_CHAR Comparison
The LR_CHAR comparison is a left/right character string comparison
that can compare place and ZIP code information in geocoding
applications. A single field on the user data file must be matched to
the two fields on FileB on a character-by- character basis.
Census Bureau Tiger files and other geographic reference files contain
a left ZIP code and a right ZIP code, a left city code and a right city
code. The left code applies if the there was a match to the left address
range interval and the right code applies if there was a match to the
right address range.

QualityStage Designer User Guide B-15


B MATCH COMPARISONS
LR_UNCERT Comparison

You specify the following three fields:

Field Description
varA The field from FileA
varB1 The left field (ZIP, city, etc.) from FileB
varB2 The right field (ZIP, city, etc.) from FileB

The LR_CHAR requires a mode:

Mode Description
EITHER VarA must match varB1 or varB2 (or both) to receive
the full agreement weight.
BASED_PREV Use the result of a previously D_INT comparison to
decide which field to compare.

If you specify the EITHER mode, varA must match one or both varB1
or varB2 to receive an agreement weight. If you specified the
BASED_PREV mode, varA must match to B1 of a previous D_INT
comparison or of a similar double interval comparison in which varA
matched to the left interval, or the varA field must match to varB1 of
the previous D_INT in which varA matched to the right interval. If
neither the left or right interval agrees, the missing weight for the
field is assigned.

LR_UNCERT Comparison
The LR_UNCERT comparison is a left/right uncertainty string
comparison that is used in conjunction with geocoding applications for
comparing place information. Census Bureau Tiger files and other
geographic reference files contain a left ZIP code and a right ZIP code,
a left city code and a right city code, etc.

B-16 QualityStage Designer User Guide


MATCH COMPARISONS
LR_UNCERT Comparison

You specify the following three fields:

Field Description
varA The field from FileA
varB1 The left field (city, for example) from FileB
varB2 The right field (city, for example) from FileB

The LR_UNCERT comparison requires the following parameter:

Parameter Description
Param 1 The minimum threshold, which is a number between 0
and 900. Use the following guidelines:
900 The two strings are identical
850 The two strings can be considered the same
800 The two strings are probably the same
750 The two strings are probably different
700 The two strings are different

The LR_UNCERT requires a mode:

Mode Description
EITHER The contents of varA must match one or both of the
varB fields specified to receive the full agreement
weight.
BASED_PREV Use the result of a previous LR_UNCERT comparison
to decide which field to compare.

The LR_UNCERT comparison operates identically to the LR_CHAR


comparison except this comparison uses the uncertainty character
algorithm. For example, a field on FileA named, CITY_NAME, is
being matched to either the LEFT_CITY field or the RIGHT_CITY
field on FileB. Since there could be some error in the CITY_NAME
field, a minimum threshold of 700.0 is included.

QualityStage Designer User Guide B-17


B MATCH COMPARISONS
MULT_EXACT Comparison

MULT_EXACT Comparison
The MULT_EXACT comparison compares all words on one record in
the field with all words in the second record. This comparison is
similar to array matching, except that the individual words are
considered to be the array elements. This type of comparison allows
matching of free-form text where the order of the words may not
matter and where there may be missing words or words in error. The
score is based on the similarity of the fields.
For example:
Building 5 Apartment 4-B
would match:
Apartment 4-B Building 5
You specify the following fields:

Field Description
varA The character string from FileA
varB The character string from FileB

MULT_RANGE Comparison
The MULT_RANGE comparison matches a single house number to a
list of house number ranges. Each range must be separated by a pipe
symbol (|). The tilde (~) is used to indicate the ranges, since the
hyphen may be a legitimate address suffix (123-A). The prefix “B:” can
be used to signify both odd and even numbers in the range. Otherwise,
the parity of the low number is used.
In this example:
101~199 | B:201~299|456|670 ½| 800-A~898-B|1000~
The following ranges are defined:
101 to 199 odd numbers only
201 to 299 both odd and even number
456 (the one house number only)

B-18 QualityStage Designer User Guide


MATCH COMPARISONS
MULT_UNCERT Comparison

670 ½ (the one house number only)


800-A to 898-B even numbers only
All even house numbers 1000 or greater.
You specify the following fields:

Field Description
varA The character string from FileA
varB The character string from FileB

Note: The MULT_RANGE comparison is used with the QualityStage


WAVES Stage only.

MULT_UNCERT Comparison
The MULT_UNCERT comparison is identical to MULT_EXACT,
except the uncertainty character comparison routine is used to match
the words. For more information on this uncertainty routine, see
“UNCERT Comparison” on page B-25.
For example:
Bilding 5 Apartment 4B
Would be close to:
Apartment 4-B Building 5
You specify the following two fields:

Field Description
varA The character string from FileA
varB The character string from FileB

QualityStage Designer User Guide B-19


B MATCH COMPARISONS
NAME_UNCERT Comparison

The UNCERT comparison requires the following parameter:

Parameter Description
Param 1 The cutoff threshold, which is a number between 0
and 900. Use the following guidelines:
900 The two strings are identical
850 The two strings can be safely considered to be
the same
800 The two strings are probably the same
750 The two strings are probably different
700 The two strings are almost certainly different

The weight assigned is proportioned linearly between the agreement


and disagreement weights, dependent upon how close the score is to
the cutoff threshold. For example, if you specify 700 and the score is
700 or less, then the full disagreement weight is assigned. If the
strings agree exactly, the full agreement weight is assigned.

NAME_UNCERT Comparison
The NAME_UNCERT comparison compares first names, where one
name might be truncated. This comparison uses the shorter length of
the two names for the comparison and does not compare any
characters after that length.
For example, the following two sets of first names would be considered
exact matches:
AL ALBERT
W WILLIAM
This is different from CHAR where these two names would not match.
The length is computed by ignoring trailing blanks (spaces).
Embedded blanks are not ignored.

B-20 QualityStage Designer User Guide


MATCH COMPARISONS
NUMERIC Comparison

You specify the following two fields:

Field Description
varA The first name from FileA
varB The first name from FileB

The NAME_UNCERT comparison requires the following parameter:

Parameter Description
Param 1 The minimum threshold, which is a number between 0
and 900. Use the following guidelines:
900 The two strings are identical
850 The two strings can be safely considered to be
the same
800 The two strings are probably the same
750 The two strings are probably different
700 The two strings are almost certainly different

The weight assigned is proportioned linearly between the agreement


and disagreement weights, dependent upon how close the score is to
the cutoff threshold. For example, if you specify 700 and the score is
700 or less, then the full disagreement weight is assigned. If the
strings agree exactly, the full agreement weight is assigned.

NUMERIC Comparison
The NUMERIC comparison is an algebraic numeric compare. Leading
spaces are converted to zeros and the numbers are compared. You can
use this comparison with arrays and reverse matching.

QualityStage Designer User Guide B-21


B MATCH COMPARISONS
PREFIX Comparison

You specify the following two fields:

Field Description
varA The field from FileA
varB The field from FileB

PREFIX Comparison
The PREFIX comparison compares character strings one, of which
might be truncated. This comparison uses the shorter length of the
two strings for the comparison and does not compare any characters
after that length. You can use this comparison with reverse matching.
For example, a last name of ABECROMBY could be truncated to
ABECROM. The PREFIX comparison considers these two
representations to be an equal match. This is different from CHAR
where these two names would not match. The length is computed by
ignoring trailing blanks (spaces). Embedded blanks are not ignored.
You specify the following two fields:

Field Description
varA The string from FileA
varB The string from FileB

PRORATED Comparison
The PRORATED comparison allows numeric fields to disagree by a
specified absolute amount that you specify. A difference of zero
between the two fields results in the full agreement weight being
assigned. A difference of more or equal to the absolute amount results
in the disagreement weight being assigned. Any difference between
zero and the specified absolute amounts receives a weight

B-22 QualityStage Designer User Guide


MATCH COMPARISONS
PRORATED Comparison

proportionally equal to the difference. You can use this comparison


with arrays and reverse matching.
Small differences between the fields receive slightly less than the full
agreement weight; large differences receive weights closer to the
disagreement weight.
For example, if the absolute amount is 15 and the value from FileB is
greater by 18 than the value from FileA, the comparison receives the
full disagreement weight. If the value from FileB is greater by 8 the
value from FileA, the comparison receives a weight exactly between
the agreement and disagreement weight.
You specify the following two fields:

Field Description
varA The numeric field from FileA
varB The numeric field from FileB

The PRORATED comparison requires at least one and can use two
parameters:

Parameter Description
Param 1 The absolute value difference that can be tolerated. If
you only specify Param 1, this the difference that can
be tolerated for either varB greater than varA or varA
greater than varB.
If you specified both parameters, Param 1 is the
difference tolerated for varB greater than varA.
Param 2 The absolute value difference that can be tolerated
when varB is less than varA.

For example, if you are comparing two dates and specify 5 for Param 1
and 7 for Param 2, the varB can exceed varA by 5 days, but the varA
can exceed varB by 7 days.

QualityStage Designer User Guide B-23


B MATCH COMPARISONS
TIME Comparison

TIME Comparison
The TIME comparison compares times in hours and minutes or only
hours. The time must be in 24 hour format in which 0 is midnight and
2359 is 11:59 PM. Times can cross midnight since the difference is
always the shortest way around the clock. You can specify an
acceptable maximum time difference in minutes. You can use this
comparison with arrays.
A difference of zero between the two times results in the full
agreement weight being assigned. A difference of more or equal to the
absolute amount results in the disagreement weight being assigned.
Any difference between zero and the specified maximum time
difference receives a weight proportionally equal to the difference.
For example, if the maximum time difference is 10 and the times
differ by 12 minutes, the comparison receives the full disagreement
weight. If the times differ by 5 minutes, the comparison receives a
weight between the agreement and disagreement weight. If you want
to specify unequal tolerance, you specify a second time allowance.
You specify the following two fields:

Field Description
varA The time from FileA
varB The time from FileB

The TIME comparison requires at least one and can use two
parameters:

Parameter Description
Param 1 The maximum time difference that can be tolerated. If
you only specify Param1, this is the difference that can
be tolerated for either varA grater than varB or varB
greater than varA.If you specified both parameters,
Param 1 is the difference tolerated for varB greater
than varA.
Param 2 The maximum time difference that can be tolerated in
other direction when varB is less than varA.

B-24 QualityStage Designer User Guide


MATCH COMPARISONS
UNCERT Comparison

For example, if you specify 20 for Param 1 and 14 for Param 2, varB
can exceed varA by 20 minutes, but varA can exceed varB by 4
minutes. The second parameter allows for minor errors in recording
the times.

UNCERT Comparison
The UNCERT comparison is a character comparison that uses an
information-theoretic character comparison algorithm to compare two
character strings. This comparison provides for phonetic errors,
transpositions, random insertion, deletion, and replacement of
characters within strings. You can use this comparison with arrays
and reverse matching.
The weight assigned is based on the difference between the two
strings being compared as a function of the string length (longer words
can tolerate more errors and still be recognizable than shorter words
can), the number of transpositions, and the number of unassigned
insertions, deletions, or replacement of characters.
You specify the following two fields:

Field Description
varA The character string from FileA
varB The character string from FileB

QualityStage Designer User Guide B-25


B MATCH COMPARISONS
USPS Comparison

The UNCERT comparison requires the following parameter:

Parameter Description
Param 1 The cutoff threshold, which is a number between 0
and 900. Use the following guidelines:
900 The two strings are identical
850 The two strings can be safely considered to be
the same
800 The two strings are probably the same
750 The two strings are probably different
700 The two strings are almost certainly different

The weight assigned is proportioned linearly between the agreement


and disagreement weights, dependent upon how close the score is to
the cutoff threshold. For example, if you specify 700 and the score is
700 or less, then the full disagreement weight is assigned. If the
strings agree exactly, the full agreement weight is assigned.

USPS Comparison
The USPS comparison processes United States Postal Service (USPS)
ZIP+4 files or other files that can contain non-numeric address ranges.
The USPS comparison requires that FileA contains the field names for
the house number and FileB contains a low house number range, a
high house number range, and a control field, indicating the parity of
the house number range.
You specify the following four fields:

Field Description
varA The house number from FileA

B-26 QualityStage Designer User Guide


MATCH COMPARISONS
USPS_DINT Comparison

Field Description
varB1 The ZIP+4 field primary low house number for the
beginning of the range from FileB
varB2 The ZIP+4 field primary high house number for the
ending of the range from FileB
Bcontrol The odd/even parity for the range defined with varB1
and varB2

The control information, from the USPS ZIP+4 code, is:

Control Description
O The range represents only odd house numbers.
E The range represents only even house numbers.
B The range represents all numbers (both odd and even)
in the interval.
U The parity of the range is unknown.

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number.
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values.

USPS_DINT Comparison
The USPS_DINT comparison is an interval to double interval USPS
comparison that compares an interval on FileA to two intervals on
FileB. If the interval on FileA overlaps any part of either interval on
FileB and the parity flags agree, the results match.

QualityStage Designer User Guide B-27


B MATCH COMPARISONS
USPS_DINT Comparison

FileA requires an address primary low number, an address primary


high number, and an address primary odd/even control, the USPS
ZIP+4 file contains this information. FileB requires two primary low
numbers, two primary high numbers, and two primary odd/even
controls, one for each side of the street.
This comparison is useful for matching the USPS ZIP+4 file to a
geographic reference file such as the Census Bureau TIGER file, GDT
Dynamap, Etak MapBase, or other reference files.
You specify the following nine fields:

Field Description
varA1 The beginning of the street address range from FileA
varA2 The ending of the street address range from FileA
varB1 The beginning of the street address range for one side
of the street (such as from left) from FileB
varB2 The ending of the street address range for one side of
the street (such as from left) from FileB
varB3 The beginning of the street address range for the other
side of the street (such as from right) from FileB
varB4 The ending of the street address range for the other
side of the street (such as to right) from FileB
Acontrol The odd/even parity for the range defined with varA1
and varA2
Bcontrol1 The odd/even parity for the range defined with varB1
and varB2
Bcontrol2 The odd/even parity for the range defined with varB3
and varB4

For an undup run, the controls should be the same field.

B-28 QualityStage Designer User Guide


MATCH COMPARISONS
USPS_DINT Comparison

The control information from the USPS ZIP+4 code is:

Control Description
O The range represents only odd house numbers.
E The range represents only even house numbers.
B The range represents all numbers (both odd and even)
in the interval.
U The parity of the range is unknown.

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number.
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values.

Agreement weight is assigned when:


• The odd/even control is set to E, O, or B on both FileA and FileB
• The odd/even control is set to E or O on one file and to B on the
other file (such as E on FileA and B on FileB)
Disagreement weight is assigned when the parity is on one file is set
to E or O and on the other file is set to the opposite; that is, either
FileA to E and FileB to O or FileA to O and FileB to E.
If all strings are numeric, the comparison performs an integer interval
comparison; otherwise, the comparison performs an alphanumeric
interval comparison.
The interval on FileA is first compared to the first interval defined
with varB1 and varB2. If the odd/even parity agree (the A control

QualityStage Designer User Guide B-29


B MATCH COMPARISONS
USPS_INT Comparison

matches B1 control or B2 control), and the intervals overlap; the


intervals are considered a match; for example:

File Begin range End Range Odd/Even


FileA 101 199 O
FileB 1st interval 123 299 B
FileB 2nd interval 124 298 B

The FileA interval matches the interval on FileB defined by varB1


and varB2 because the odd/even parity is compatible (odd on FileA
and both on FileB), and the interval 101-199 overlaps with 123-299.
If the interval on FileA does not match the first interval on FileB, the
FileA interval is compared with the interval on FileB defined by varB3
and varB4 for a match.

USPS_INT Comparison
The USPS_INT comparison is an interval to interval comparison that
compares an interval on FileA to an interval on FileB. If the interval
on FileA overlaps any part of the interval on FileB and the parity
agrees, the results match.
Both files require an address primary low number, an address
primary high number, and an address primary odd/even control, such
as the USPS ZIP+4 file control field.
You specify the following fields:

Field Description
varA1 The beginning of the street address range from FileA
varA2 The ending of the street address range from FileA
varB1 The beginning of the street address range from FileB

B-30 QualityStage Designer User Guide


MATCH COMPARISONS
USPS_INT Comparison

Field Description
varB2 The ending of the street address range from FileB
Acontrol The odd/even parity for FileA
Bcontrol The odd/even parity for FileB

The control information from the USPS ZIP+4 code is:

Control Description
O The range represents only odd house numbers.
E The range represents only even house numbers.
B The range represents all numbers (both odd and even)
in the interval.
U The parity of the range is unknown.

Interval comparisons require a mode:

Mode Description
ZERO_VALID Indicates zero or blanks should be treated as any other
number.
ZERO_NULL Indicates zero or blank fields should be considered
null or missing values.

Agreement weight is assigned when:


• The odd/even control is set to E, O, or B on both FileA and FileB
• The odd/even control is set to E or O on one file and to B on the
other file (such as E on FileA and B on FileB)
Disagreement weight is assigned when the parity on one file is set to E
or O and on the other file is set to the opposite; that is, either FileA to
E and FileB to O or FileA to O and FileB to E.
If all strings are numeric, the comparison performs an integer interval
comparison; otherwise, the comparison performs an alphanumeric
interval comparisons.

QualityStage Designer User Guide B-31


B MATCH COMPARISONS
USPS_INT Comparison

B-32 QualityStage Designer User Guide


C

Rule Set Files

This appendix describes the rule set files used by the Standardize,
Multinational Standardize, and Investigate stages, as well as by the
WAVES stage.
Rule sets are fundamental to the standardization process. They
determine how fields in input records are parsed and classified into
tokens.
You can also create new rule sets using QualityStage Designer.

Features
The features and benefits offered by the rule set file architecture are:
• Support of your Business Intelligence objective by maximizing the
critical information contained within data. The data structures
created by the rules provide comprehensive addressability to all
data elements necessary to meet data storage requirements and
facilitate effective matching.
• Modular design that allows for a “plug and play” approach to
solving complex standardization challenges.
• Flexible approach to input data file format. You do not need to
organize the columns in any particular order.

QualityStage Designer User Guide C-1


C RULE SET FILES
Rule Set Files

• Country-specific design enables you to use multinational data. The


rules are designed to recognize and conform to the name and
address conventions used in specific countries.
• Rule override functionality makes standardization jobs easier to
customize.
• Identification and collection of unhandled data and generation of
the corresponding unhandled pattern make it easier to perform
quality assurance and determine necessary customizations.

Rule Set Files


Each rule set requires the following four files:
• Dictionary File (.DCT).
• Classification Table (.CLS).
• Pattern-Action File (.PAT).
• Rule Set Description File (.PRC).
Some rule sets also include:
• Lookup tables (.TBL).
• Override tables (.TBL).
These files and tables are described in the rest of this chapter.

Dictionary File
The Dictionary File defines the fields for the output file for this rule
set. The file holds a list of domain, matching, and reporting fields.
Each field is identified by a two-character abbreviation, for example
CN for City Name. The Dictionary also provides the data type
(character, for instance) and field offset and length information.

C-2 QualityStage Designer User Guide


RULE SET FILES
Dictionary File

The format for the Dictionary File is:


field-identifier/ field-type/ field-length /missing value identifier[ ;
comments]

Format Description
field-identifier A two character field name (case insensitive) that
must be unique for all dictionaries. The first character
must be an alpha character. The second character can
be an alpha character or a digit.
If this field is overlaid, enter two asterisks (**) for the
field-identifier and put the field-identifier in the first
two chars of the comments position.
field-type The type of information in the field (see “Field Types”
on page C-5).
field-length The field length in characters.
missing value A missing value identifier. The possible values are:
identifier S – spaces
Z – zero or spaces
N – negative number (for example, –1)
9 – all nines (for example, 9999)
X – no missing value
Generally, use X or S for this argument.
description The field description that appears with the field in the
Data File Wizard.
; comments Optional, unless this is an overlaid field (two asterisks
(**) are the field-identifier) comments, including a
field-identifier for overlaid fields, and must follow a
semicolon (;).
Note that comments can also be place on a separate
line if preceded by a semicolon.

First line The first line of a Dictionary File must be:


\FORMAT\ SORT=N

QualityStage Designer User Guide C-3


C RULE SET FILES
Dictionary File

Comment lines The Dictionary File must also include the following two comment
lines:
• ; Business Intelligence Fields. This comment line must immediately
precede the list of fields.
• ; Matching Fields. This comment line must immediately follow the list
of fields.

Important: If the Dictionary File does not include these two comment lines,
QualityStage Designer cannot display the list of fields.

The following example shows part of a USADDR Dictionary File:


\FORMAT\ SORT=N
;------------------------------------------------------------
; USADDR Dictionary File
;------------------------------------------------------------
; Total Dictionary Length = 415
;------------------------------------------------------------
; Business Intelligence Fields
;------------------------------------------------------------
HN C 10 S HouseNumber ;0001-0010
HS C 10 S HouseNumberSuffix ;0011-0020
.
.
.
AA C 50 S AdditionalAddressInformation ;0187-0236
;-------------------------------------------------------------
; Matching Fields
;-------------------------------------------------------------
.
.
.

Field order The order of fields in the Dictionary File is the order in which the
fields appear in the output file.

C-4 QualityStage Designer User Guide


RULE SET FILES
Dictionary File

Field Types
The following field types are supported:

Field Type Definition


N Numeric fields, which are right-justified and filled with
leading blanks.
C Alphabetic fields, which are left-justified and filled with
trailing blanks.
NS Numeric field in which leading zeros should be stripped.
For example, you define the house number field as NS
and ZIP codes as N because leading zeros matter with
ZIP codes, but could interfere with matching on the
house number.
M Mixed alphabetics and numerics in which numeric
values are right-justified and alphabetic values are
left-justified. Leading zeros are retained. The U.S.
Postal Service uses this type of field for house numbers
and apartment numbers. For example, a four-character
type M field, where β represents a space, is:
102 becomes β102
A3 becomes A3ββ
MN Mixed name, which is generally used for representing
street name. Field values beginning with an alphabetic
are left-justified. Field values beginning with a number
are indented as if the number is a separate four
character field.
In the following example (β represents a space), the
single digit numbers are indented three spaces, two digit
numbers are indented two spaces, etc. The U.S. Postal
Service uses this type of field for street names in the
ZIP+4 files.
MAIN
CHERRY HILL
βββ2ND
ββ13TH
β123RD
1023RD

QualityStage Designer User Guide C-5


C RULE SET FILES
Classification Table

Classification Table
The Classification Table allows the standardization process to identify
and classify key words such as street name, street type, directions,
and so on, by providing:
• Standard abbreviations for each word; for example, HWY for
Highway.
• A list of single-character classification tokens that are assigned to
individual data elements during processing.
The header of the Classification Table includes the name of the rule
set and the classification legend, which indicates the classes and their
descriptions.
The Standardize stage uses the Classification Table to identify and
classify key words (or tokens), such as street types (AVE, ST, RD),
street directions (N, NW, S), and titles (MR, DR). The Classification
table also provides standardization for these words.
The format for the Classification Table file is:
token /standard value/class /[threshold-weights]/ [; comments]

Format Description
token Spelling of the word as it appears in the input file.
standard value The standardized spelling or representation of the
word in the output file. The standardization process
converts the word to this value. The standardization
can be multiple words, which must be enclosed in
double quotation marks. You can use up to twenty-five
characters.
The standardization can be either an abbreviation for
the word; for example, the direction WEST, WST, or W
is converted to W. Optionally, the standardization
could force an expansion of the word; for example,
POB converted to “PO BOX”.

C-6 QualityStage Designer User Guide


RULE SET FILES
Classification Table

Format Description
class A one-character tag indicating the class of the word.
The class can be any letter, A to Z, and a zero (0),
indicating a null word.
threshold- Specifies the degree of uncertainty that can be
weights tolerated in the spelling of the word. The weights are:
900 Exact match
800 Strings are almost certainly the same
750 Strings are probably the same
700 Strings are probably different
Lower numbers tolerate more differences between the
strings.
comments Optional, and must follow a semicolon (;). Comments
can also be place on a separate line if preceded by a
semicolon.

The first line of a Classification Table file must be:


\FORMAT\ SORT=Y
Do not include any comments before this line. This line causes the
table to be sorted on token in virtual memory.
Each token must be a single word. Multiple or compound words (such
as New York, North Dakota, Rhode Island) are considered separate
tokens.

Threshold Weights
The threshold weights specify the degree of uncertainty that can be
tolerated in the spelling of the token. An information-theoretic string
comparator is used that can take into account phonetic errors, random
insertion, deletion and replacement of characters, and transpositions
of characters.
The score is weighted by the length of the word, since small errors in
long words are less serious than errors in short words. In fact, the
threshold should be omitted for short words since errors generally
cannot be tolerated.

QualityStage Designer User Guide C-7


C RULE SET FILES
Pattern-Action File

The Null Class


If you use a class of zero to indicate that a word is a null (or noise)
word, these words are skipped in the pattern matching process. In the
following example, the standard abbreviation can be any value desired
since it is not used:
OF 0 0
THE 0 0

Pattern-Action File
The Pattern-Action file contains the rules for standardization; that is,
the actions to execute with a given pattern of tokens.
This section first describes the principles behind pattern matching,
tokenization, and classification. It concludes with a description of the
file itself.
For more detailed information about the Pattern-Action file, see the
QualityStage Pattern-Action Reference Guide.

Pattern Matching Principles


This section presents an introduction to the concepts of pattern
matching and the reasons that such matching is required to obtain
correct standardization in all instances.
If all elements of an address are uniquely identified by keywords,
address standardization is easy. The following example is not subject
to any ambiguity. The first field is numeric (house number), the next
is a direction (uniquely identified by the token N), the next is an
unknown word MAPLE, and the last is a street type, AVE:
123 N MAPLE AVE
Most addresses fall into this pattern with minor variations:
123 E MAINE AV
3456 NO CHERRY HILL ROAD
123 SOUTH ELM PLACE

C-8 QualityStage Designer User Guide


RULE SET FILES
Pattern-Action File

These addresses all fit into the pattern:


Numeric
Direction
Unknown word or words
Street type
The first numeric is interpreted as the house number and must be
moved to the house number field {HN}. The direction is moved to the
pre-direction field {PD}; the street names to the street name field {SN};
and the street type to the {ST} field.
The braces indicate that the reference is to a dictionary field that
defines a field in the output file; for example:

This pattern To this action


Numeric {HN}
Direction {PD}
Unknown word or words {SN}
Street type {ST}

Tokenization and Classification


Standardization begins by separating all of the elements of the
address into tokens. Each token is a word, a number, or a mixture
separated by one or more spaces. At the same time the tokens are
formed, each token is classified by looking to see if the token is in the
classification table file. If the token is there, it is assigned the class
indicated by the table. If it is not in the table, the token is given one of
the following classes:

^ Numeric, containing all digits, such as 1234


? Unknown token, containing one or more words, such as
CHERRY HILL
> Leading numeric, containing numbers followed by one or
more letters, such as 123A

QualityStage Designer User Guide C-9


C RULE SET FILES
Pattern-Action File

< Leading alpha, containing letters followed by one or


more numbers, such as A3
@ Complex mix, containing a mixture of alpha and numeric
characters that do not fit into either of the above classes,
such as:
123A45
ABC345TR
~ Special, containing special characters that are not
generally encountered in addresses, including !, \, @, ~,
%, etc.
0 Null
- Hyphen
/ Slash
& Ampersand
# Number sign
( Left parenthesis
) Right parenthesis
Note that the special token class (~) includes characters that are
generally not encountered in addresses. The class for this type of
token, if it consists of a single character, can change if it is included in
the SEPLIST.
If a special character is included in the SEPLIST and not in the
STRIPLIST, the token class for that character becomes the character
itself. For example, if a @ appears in the SEPLIST and not in the
STRIPLIST, and if the input contains a @, it appears as a separate
token with token value @ and class value @. Similarly, if the backslash
(\) appears in the SEPLIST and not in the STRIPLIST, its class is \
(the backslash is the escape character, to be used in a pattern, it must
itself be escaped.
A null token is any word that is to be considered noise. These words
can appear in the classification table and are given a type of zero.
Similarly, actions can convert normal tokens into null tokens.
An example of the standard address form is:
123 NO CHERRY HILL ROAD

C-10 QualityStage Designer User Guide


RULE SET FILES
Pattern-Action File

This address receives the following token and classes:

123 ^ Numeric
No D Direction
Cherry Hill ? Unknown words
Road T Street type
The pattern represented by this address can be coded as:
^|D|?|T
The vertical lines separate the operands of a pattern. The address
above matches this pattern. The classification of D comes from the
token table. This has entries, such as NO, EAST, E, and NW, which are
all given a class of D to indicate that they generally represent
directions. Similarly, the token class of T is given to entries in the
table representing street types, such as ROAD, AVE, and PLACE.

Pattern-Action File Structure


The Pattern-Action file consists of a series of patterns and associated
actions. After the input record is separated into tokens and classified,
the patterns are executed in the order they appear in the
Pattern-Action file. A pattern either matches the input record or does
not match. If it matches, the actions associated with the pattern are
executed. If not, the actions are skipped. In either case, processing
continues with the next pattern in the file.
The Pattern-Action file is an ASCII file that can be created or updated
using any standard text editor. It has the following general format:
\POST_START
post-execution actions
\POST_END
\PRAGMA_START
specification statements
\PRAGMA_END

pattern
actions

QualityStage Designer User Guide C-11


C RULE SET FILES
Pattern-Action File

pattern
actions


There are two special sections in the Pattern-action File. The first
section consists of post-execution actions within the \POST_START
and \POST_END lines. The post-execution actions are those actions
which should be executed after the pattern matching process is
finished for the input record.
Post-execution actions include computing Soundex codes, NYSIIS
codes, reverse Soundex codes, reverse NYSIIS codes, copying,
concatenating, and prefixing dictionary field value initials.
The second special section consists of specification statements within
the \PRAGMA_START and \PRAGMA_END lines. The only
specification statements currently allowed are SEPLIST and
STRIPLIST. The special sections are optional. If omitted, the header
and trailer lines should also be omitted.
Other than the special sections, the Pattern-action File consists of sets
of patterns and associated actions. The pattern requires one line. The
actions are coded one action per line. The next pattern can start on the
following line.
Blank lines can be used to increase readability. For example, it is
suggested that blank lines or comments separate one pattern-action
set from another.
Comments follow a semicolon. An entire line can be a comment line by
specifying a semicolon as the first non-blank character; for example:
;
; This is a standard address pattern
;
^ | ? | T ; 123 Maple Ave
As an illustration of the pattern format, consider post actions of
computing a NYSIIS code for street name and processing patterns to
handle:
123 N MAPLE AVE
123 MAPLE AVE

C-12 QualityStage Designer User Guide


RULE SET FILES
Pattern-Action File

\POST_START
NYSIIS {SN} {XS}
\POST_END
^|D|?|T ; 123 N Maple Ave
COPY [1] {HN} ; Copy House number (123)
COPY_A [2] {PD} ; Copy direction (N)
COPY_S [3] {SN} ; Copy street name (Maple)
COPY_A [4] {ST} ; Copy street type (Ave)
EXIT

^|?|T
COPY [1] {HN}
COPY_S [2] {SN}
COPY_A [3] {ST}
EXIT
Note that this example Pattern-action File has a post section that
computes the NYSIIS code of the street name (in field {SN}) and
moves the result to the {XS} field.
The first pattern matches a numeric followed by a direction followed
by one or more unknown words followed by a street type (as in 123 N
MAPLE AVE). The associated actions are to:

1. Copy operand [1] (numeric) to the {HN} house number field.


2. Copy the standard abbreviation of operand [2] to the {PD} prefix
direction field.
3. Copy the unknown words in operand [3] to the {SN} street name
field.
4. Copy the standard abbreviation of the fourth operand to the {ST}
street type field.
5. Exit the pattern program. A blank line indicates the end of the
actions for the pattern.
The second pattern-action set is similar except that this handles cases
like 123 MAPLE AVE. If there is no match on the first pattern, the next
pattern in the sequence is attempted.

QualityStage Designer User Guide C-13


C RULE SET FILES
Rule Set Description File (.PRC)

Rule Set Description File (.PRC)


The Rule Set Description file is an ASCII file that displays a
description of the rule set in the Available Processes list. You can
enter any text string, including special characters, for the description.
The list box provides space for a string of approximately 50 characters.
An example of the U.S. Address rule set .prc file is:
Domain-Specific Rule Set for U.S. Addresses
This file is used only in QualityStage Designer.

Lookup Tables (.TBL)


Some rule sets use one or more lookup tables. The tables contain
information that is specific to the rule set; for example, a lookup table
containing names is included in the Domain Pre-Processor rule set.
You can manually edit the contents of these tables if necessary. The
tables included are dependent on the rule set to which they belong.

Override Tables
The override tables are designed to complement the Classification
table and the Pattern-action file by providing additional instructions
during processing. The information in the override tables take
precedence over the contents of the rule set files. These tables enable
you to adjust tokenization and standardization behavior if the results
you are getting are incorrect or incomplete.
You use the Overrides dialog boxes in the QualityStage Designer to
edit the contents of the override tables. See Appendix E, “Customizing
and Testing Rule Sets”” for more information.

C-14 QualityStage Designer User Guide


RULE SET FILES
Where Rule Set Files and Override Tables Are Located

Where Rule Set Files and Override Tables Are Located


Standardize rule sets with their files and override tables are in a rules
directory, which by default is in the directory where you installed
QualityStage Designer.
You can specify an alternate directory using the Designer Options
dialog box (see the “Setting QualityStage Designer Options” on page
3-12).
WAVES/Multinational Standardize rule sets and override tables are
located in a directory on your QualityStage server. You cannot specify
an alternate location.

QualityStage Designer User Guide C-15


C RULE SET FILES
Where Rule Set Files and Override Tables Are Located

C-16 QualityStage Designer User Guide


D

More About Using Rules

This appendix provides supplemental information about using


Standardize rule sets (Country Identifier, Domain Pre-Processor,
Domain-Specific, and Validation) and how they operate.
For primary information on how to use rule sets, see Chapter 9,
“Defining Investigate Stages” and Chapter 10, “Defining Standardize
Stages”.

Note: The descriptions in this Appendix do not apply to


WAVES/Multinational Standardize rule sets.

Country Identifier Rule Set


The Country Identifier rule set is designed for situations where
multinational files are presented for standardization. The purpose of
the country identifier rule set is to assign to each input record the
appropriate two-byte ISO country code associated with the geographic
origin of the record’s address and area information.
The Country Identifier rule set is both:
• An investigation tool to determine if an input file contains
multi-national data.
• An input preparation tool to facilitate segmenting the input file
into country-specific subsets for country-specific processing.

QualityStage Designer User Guide D-1


D MORE ABOUT USING RULES
Country Identifier Rule Set

Input File: Country Code Delimiters


The Country Identifier rule set uses a default country delimiter when
the rule set cannot determine the record’s country of origin. This
delimiter consists of a default country code, which you must define
before you run the rule in a job. You should use the country code that
you believe represents the majority of the records.
Where the country identifier rule set cannot determine the country
code, the default value will be taken from the country code delimiter
and assigned to the record.
The country code delimiter format is:
ZQ<Two-Byte ISO Country Code>ZQ
For example, the country code delimiter for the United States would
be ZQUSZQ.
The delimiter is entered in the Command Definition dialog box in the
QualityStage user interface.
See Appendix F for a complete listing of ISO Country Codes.

Output File
The rule set creates an output file in which the following fields are
appended to the beginning of each input record:
• A two-byte ISO country code. The code is associated with the
geographic origin of the record’s address and area information.
• An Identifier flag. The values are:

Flag Description
Y The rule set was able to identify the country.
N The rule set was not able to identify the country and used the
default value that you set as the default country delimiter.

D-2 QualityStage Designer User Guide


MORE ABOUT USING RULES
Domain Pre-Processor Rule Sets

Domain Pre-Processor Rule Sets


The Domain Pre-Processor rule sets evaluate mixed-domain input
(free-form name and address information) and categorize the data into
domain-specific column sets. After the proper domains are identified,
you can use the Domain-Specific rule sets to create the appropriate
standardized structures. These rule sets evaluate the mixed-domain
input from a file for a specific country.
These rule sets do not perform standardization, but parse the fields
within each record and filters each token into one of the appropriate
domain-specific column sets, (Name, Address, or Area).
The results of this rule set are appended Domain-Specific column sets
for Name, Address, and Area information. The Name, Address, and
Area column sets are the input fields to the respective
Domain-Specific rule sets.

Input File
The Domain Pre-Processor rule sets do not assume a data domain
with a field position. Therefore, you must insert at least one metadata
delimiter for a field in your input record. It is strongly recommended
that you delimit every field or group of fields. The delimiter indicates
what kind of data you are expecting to find in the field based on one or
more of the following:
• Metadata description
• Investigation results
• An informed estimate
The delimiter names are:

Delimiter name Description


ZQNAMEZQ Name delimiter
ZQADDRZQ Address delimiter
ZQAREAZQ Area delimiter

QualityStage Designer User Guide D-3


D MORE ABOUT USING RULES
Domain Pre-Processor Rule Sets

Why You Use the Domain Pre-Processor Rule Sets


Since input files are usually not domain-specific, these rule sets are
critical when preparing a file for standardization. Fields can contain
data that do not match their metadata description. Here is an
example:

Metadata Label Data Content


Name 1 John Doe
Name 2 123 Main Street Apt. 456
Address 1 C/O Mary Doe
Address 2 Boston, MA 02111

Where the domains are:

Domain Name Data Content


Name John Doe
Name C/O Mary Doe
Address 123 Main Street Apt. 456
Area Boston, MA 02111

In addition, other problems arise when:


• Information continues across multiple column sets.
• More than one data domain is present within a single column set.
For example:

Domain Name Data Content


Name 1 John Doe and Mary
Name 2 Doe 123 Main Str
Address 1 eet Apt. 456 Boston
Address 2 MA 02111

D-4 QualityStage Designer User Guide


MORE ABOUT USING RULES
Domain Pre-Processor Rule Sets

The domains are:

Domain Name Data Content


Name John Doe and Mary Doe
Address 123 Main Street Apt. 456
Area Boston, MA 02111

As the column sets and metadata labels do not necessarily provide


hard information about data content, preprocessing categorizes the
input data into domain-specific column sets: Name, Address, and
Area.

Domain Pre-Processor File Names


Each rule set has a group of files associated with it. See Appendix C,
“Rule Set Files”, for more information about how they work.The
naming convention for the Domain Pre-Processor rule sets is:

Position Description Values


1-2 ISO country code US: United States
GB: Great Britain
CA: Canada
AU: Australia
DE: Germany
ES: Spain
FR: France
IT: Italy
3-6 Domain Pre-Processor abbreviation PREP
Extensions Type of file .CLS: Classification
.DCT: Dictionary
.PAT: Pattern-action
.PRC: Description

QualityStage Designer User Guide D-5


D MORE ABOUT USING RULES
Domain Pre-Processor Rule Sets

For example, here are the files in the United States Domain
Pre-Processor rule set:
USPREP.CLS Classification Table
USPREP.DCT Dictionary File
USPREP.PAT Pattern-Action File
USPREP.PRC Rule Set Description File

Domain Pre-Processor Dictionary File


There are two types of fields in the Dictionary file of a Domain
Pre-Processor rule set:
• Domain fields.
• Reporting fields.

Domain Fields
Domain Pre-Processor rule sets move every input token to one of the
following domain fields:

Field Name Description


NA NameDomain All input tokens belonging to the name
domain
AD Address Domain All input tokens belonging to the
address domain
AR Area Domain All input tokens belonging to the area
domain

D-6 QualityStage Designer User Guide


MORE ABOUT USING RULES
Domain Pre-Processor Rule Sets

Reporting Fields
Domain Pre-Processor rule sets provide reporting fields for quality
assurance and post-standardization investigation. All Domain
Pre-Processor rule sets have the following reporting fields:

Field Name Description


P1 Field Pattern 1 The pattern generated for the first
delimited field of input tokens based on
the parsing rules and token
classifications.
P2 Field Pattern 2 As above for the second field.
P3 Field Pattern 3 As above for the third field.
P4 Field Pattern 4 As above for the fourth field.
P5 Field Pattern 5 As above for the fifth field.
P6 Field Pattern 6 As above for the sixth field.
IP Input Pattern The pattern generated for the entire
stream of input tokens based on the
parsing rules and token classifications.
OP Outbound Pattern The pattern image for all tokens just
prior to being written to the output file.
UO User Override Flag A flag indicating what type of user
override was applied to this record.
CF Custom Flag Unused. Available for users to create a
flag needed in their application.

See the next table for descriptions of the user flag fields.

QualityStage Designer User Guide D-7


D MORE ABOUT USING RULES
Domain Pre-Processor Rule Sets

User Flag Descriptions


The following table describes the value of flags in the Domain
Pre-Processor rule sets:

Flag Value Description


UO NO (Default) No user re-masks were used
IP An Input Pattern user re-mask was
used
IT An Input Text user re-mask was used
FP A Field Pattern user re-mask was used
FT A Field Text user re-mask was used

Domain Masks
The Domain Pre-Processor attempts to assign a domain mask to each
input token. All pattern-actions retype tokens to one of the domain
masks, which are:

Mask Field Description


A ADDRESS
N NAME
R AREA

The final step in the Domain Pre-Processor is to output the tokens to


the domain fields based on their assigned (or defaulted) domain mask.

Upgrading Pre-Processor Rule Sets


When a new release is available, or when changes are made to the
delivered rule sets, you can incorporate the improvements into an
existing application.

D-8 QualityStage Designer User Guide


MORE ABOUT USING RULES
Domain Pre-Processor Rule Sets

You should not update:


• The rule set description file
• The dictionary file
• The user override tables.
They are specific to the application and would cause it to fail or
produce different results.
For USPREP, they would be:

USPREP.PRC Rule Set Description File


USPREP.DCT Dictionary File
USPREP.UCL User Classification Table
USPREPIT.TBL User Override Input Text Table
USPREPIP.TBL User Override Input Pattern Table
USPREPFT.TBL User Override Field Text Table
USPREPFP.TBL User Override Field Pattern Table

You should update:


• The classification table
• The pattern action file
• The lookup tables.
They may have changed from the previous version.
For USPREP, they would be:

USPREP.CLS Classification Table


USPREP.PAT Pattern-Action File
USCITIES.TBL US City Lookup Table

Since any changes to the classifications should have been made


through the user classification table (.UCL), you should be able simply
to replace the classification table. However, we recommend that a
comparison be done to make sure that what is currently on the

QualityStage Designer User Guide D-9


D MORE ABOUT USING RULES
Domain-Specific Rule Sets

application's classification table is also on the newly-delivered one.


The same goes for any lookup tables: you should be able to just copy
over the previous one, but it is safer to compare the two before you do
so.
The pattern-action file is a bit more complicated to upgrade. During
development of the application, pattern-action changes should have
been made in two subroutines: Input_Modifications and Field_Modifications.
If this is the case, copy those subroutines from the existing
pattern-action file, and then paste them in the one where the empty
subroutines are found.
Many times, other changes are made outside of the modification
subroutines. You must also add those changes to the new
pattern-action file.

Note: The development of the application is based on the rules it is


currently using, so any rules changes could impact the output. If
you must upgrade the rules in an existing application, we advise
that extensive testing be done before you run the application in
production.

Domain-Specific Rule Sets


These rule sets evaluate the domain-specific input from a file for a
specific country (the U.S., for example). There are three
Domain-Specific rules sets for each country.

Rule Set Name Country


NAME Individual and business names
ADDR Street name, number, unit, and other address
information
AREA City, state, region, and other locale information

D-10 QualityStage Designer User Guide


MORE ABOUT USING RULES
Domain-Specific Rule Sets

Domain-Specific File Names


Each rule set has a group of files associated with it. See Appendix C,
“Rule Set Files”, for more information about how they work. The
naming convention for Domain-Specific rule sets is:

Position Description Values


1-2 ISO country code US: United States
GB: Great Britain
CA: Canada
AU: Australia
DE: Germany
ES: Spain
FR: France
IT: Italy
3-6 Type of rule NAME: Name
ADDR: Address
AREA: Area
Extensions Type of file .CLS: Classification
.DCT: Dictionary
.PAT: Pattern-action
.PRC: Description

For example, here are the files in the United States NAME rule set:
USNAME.CLS Classification Table
USNAME.DCT Dictionary File
USNAME.PAT Pattern-Action File
USNAME.PRC Rule Set Description File

Domain-Specific Dictionary Files


There are three types of fields in the Dictionary file of a
Domain-Specific rule set:
• Business Intelligence fields.

QualityStage Designer User Guide D-11


D MORE ABOUT USING RULES
Domain-Specific Rule Sets

• Matching fields.
• Reporting fields.

Business Intelligence Field


Business intelligence fields help focus on critical information
contained within data. Different domains will have different business
intelligence fields. This is an example of the USAREA dictionary
fields.

Field Field Description


CN City Name
SA State Abbreviation
ZC ZIP Code
Z4 ZIP4 Add-on Code
CC Country Code

Matching Fields
Domain-Specific rule sets create data structures that facilitate
effective data matching. Different domains will have different
matching fields. The most common matching fields are phonetic keys
for primary fields. Here is an example:

Field Field Description


NC City NYSIIS
SC City Reverse Soundex

D-12 QualityStage Designer User Guide


MORE ABOUT USING RULES
Domain-Specific Rule Sets

Reporting Fields
The reporting fields for quality assurance and post-standardization
investigation. These rule sets have the following reporting fields:

Name Description
Unhandled The pattern generated for the remaining tokens not
Pattern {UP} processed by the rule set based on the parsing rules,
token classifications, and any additional
manipulations by the pattern-action language.
Unhandled The remaining tokens not processed by the rule set,
Data {UD} with one character space between each token.
Input Pattern {IP} The pattern generated for the stream of input tokens
based on the parsing rules and token classifications.
Exception Data {ED} The tokens not processed by the rule set because
they represent a data exception. Data exceptions
may be tokens that do not belong to the domain of
the rule set or are invalid or default values.
UO User Override Flag. A flag indicating what type of
user override was applied to this record.
User Re-Code A flag indicating whether the current record was
Dropped Data affected by a user re-code that specified the dropping
Flag {U5} (deleting) of one or more input tokens.

Data Flag Table


The following table describes the value of flags in the reporting fields:

Flag Value Explanation


UO NO (Default) No user re-codes were used.
IP An Input Pattern user re-code was used.
IT An Input Text user re-code was used.
UP An Unhandled Pattern user re-code was used.
UT An Unhandled Text user re-code was used.

QualityStage Designer User Guide D-13


D MORE ABOUT USING RULES
Validation Rule Sets

Validation Rule Sets


The Validation rule sets are used to standardize common business
data including:
• Date
• Email Address
• Phone Number
• Taxpayer ID/Social Security Number
These rules are configured for U.S. formats. The rule sets outputs two
types of fields, Business Intelligence fields and Reporting/Error fields.

Validation File Names


The naming convention is:

Position Description Values


1 Validation rule abbreviation V
2 to n Type of rule DATE
EMAIL
PHONE
TAXID
Extensions Type of file .CLS: Classification
.DCT: Dictionary
.PAT: Pattern-action
.PRC: Description

For example, here are the files in the DATE rule set:
VDATE.CLS Classification Table
VDATE.DCT Dictionary File
VDATE.PAT Pattern-Action File
VDATE.PRC Rule Set Description File

D-14 QualityStage Designer User Guide


MORE ABOUT USING RULES
Validation Rule Sets

VDATE Rule Set


The VDATE rule set validates the value and standardizes the format
of a date field. Note the following:
• Punctuation, such as hyphens or slashes, are removed during the
parsing step.
• The rule set outputs two types of fields, Business Intelligence fields
and Reporting/Error fields.
• There are no significant default Classification table entries used
with this rule set.
• The standard output format is CCYYMMDD.

Default Parsing Parameters


The default parsing parameters for are:
SEPLIST <space character> ,;.%:&*\"/+\\()-"
STRIPLIST <space character> ,;.:*\"\\()

Input Date Formats


Expected input date formats are any of the following:

Format Example
mmddccyy 09211991
mmmddccyy OCT021983
mmmdccyy OCT21983
mmddccyy 04101986
mm/dd/ccyy 10/23/1960
m/d/ccyy 1/3/1960
mm/d/ccyy 10/3/1960
m/dd/ccyy 1/13/1960
mm-dd-ccyy 04-01-1960

QualityStage Designer User Guide D-15


D MORE ABOUT USING RULES
Validation Rule Sets

Format Example
m-d-ccyy 1-3-1960
mm-d-ccyy 10-3-1960
m-dd-ccyy 1-13-1960
ccyy-mm-dd 1990-10-22

Output Example

Input string Output Result


1990-10-22 19901022
1/13/1960 19600113
OCT021983 19831002

Business Intelligence Output Fields


If a data value passes validation, this rule set populates these two
Business Intelligence field values.
• The valid date data {VD} field
• The valid flag {VF}
The {VD} field is populated with the eight numeric bytes, and the {VF}
field is populated with the value ‘T’.

D-16 QualityStage Designer User Guide


MORE ABOUT USING RULES
Validation Rule Sets

Error Reporting Output Fields


If a data value fails validation, this rule set populates the following
Reporting Error fields:

Field Value Description


Invalid Data Invalid data The value that did not meet the
value validation requirements.
Invalid Reason IF Invalid Format (not one of the ones
listed above)
IM Invalid Month (for example, JJJ011976
instead of JAN011776)
IT Date is on invalid table (for example,
11111111)
MM MM = Invalid numeric month (for
example, 13/10/1988)
FB Invalid day for February (for example,
02/30/1976)
M0 Invalid day for months with 30 days
M1 Invalid day for months with 31 days
Invalid Data Invalid data The value that did not meet the
value validation requirements.

VEMAIL Rule Set


The VEMAIL rule set identifies the format, components and
completeness of email addresses. Note the following:
• All email addresses should have a user, domain, and top-level
qualifier.
• Punctuation such as hyphens (-), at signs (@), and periods (.) are
used as key delimiters during the parsing step.

QualityStage Designer User Guide D-17


D MORE ABOUT USING RULES
Validation Rule Sets

• The default classification table for this rule set contains common
domain (for instance, ORG, COM, EDU, GOV, etc.) and
sub-domain qualifiers (for instance, country and state codes).

Default Parsing Parameters


The default parsing parameters are:
SEPLIST <space character>`@.
STRIPLIST <space character> `

Parsing Examples
The parsing parameters will parse the address into multiple tokens as
in the following examples:

Input String Token Token Number


John_Smith@abccorp.com John_Smith 1
abccorp 2
com 3
kjones@example.org kjones 1
example 2
org 3

The @ and . are used to separate the data. They are removed
during the parsing process.

Note: As Standardize does not re-assemble the multiple tokens into a


single value and token before processing, you should append the
input email addresses to the end of the data.

Business Intelligence Output Fields


If a data value is validated, this rule set populates these Business
Intelligence fields:

D-18 QualityStage Designer User Guide


MORE ABOUT USING RULES
Validation Rule Sets

• User {US}
• Domain {DM}
• Top-level Qualifier {TL}
• URL {RL}

Error Reporting Output Fields


If a data value fails validation, this rule set outputs these Reporting
Error fields:

Field Value Description


Unhandled Unhandled data The value that did not meet the
Data value validation requirements.
Unhandled Unhandled pattern The pattern that did not meet the
Patterns validation requirements.

VPHONE Rule Set


The VPHONE rule set validates the value and standardizes the
format of a U.S. phone number.
Punctuation such as hyphens(-) and parentheses ( ), are removed
during the parsing step.

QualityStage Designer User Guide D-19


D MORE ABOUT USING RULES
Validation Rule Sets

Classification Table Values


The default classification table (VPHONE.CLS) for this rule set
contains three values which may represent the extension part of a
phone number.

Token Standard Value Class


X X X
EXT EXT X
EXTENSION EXT X

Default Parsing Parameters


The default parsing parameters are:
SEPLIST <space character>, ; . % : & * \ " / + \ \ ( ) - _
STRIPLIST <space character>, ; . : / * \ " \ \ ( ) - _

Parsing Examples
The following table shows examples of how phone numbers are
parsed:

Input String Token Token Number


(617) 338-0300 617 1
338 2
0300 3
(617) 338-0300 EXT 316 617 1
338 2
0300 3
EXT 4
316 5
617-338-0300 X316 617 1

D-20 QualityStage Designer User Guide


MORE ABOUT USING RULES
Validation Rule Sets

Input String Token Token Number


338 2
0300 3
X316 4
316 5

The hyphen, space, and parentheses are used to separate the data.
After the data is parsed the hyphen, spaces, and parentheses are
dropped.

Validation Logic
The VPHONE rule set validates patterns and values based on the
following criteria:
• The value has 7 or 10 numeric bytes. Can be over 10 bytes with
extensions.
• The first three bytes are not all zeros (000). If all zeroes, they are
replaced with blanks
• The value is not listed on the ‘invalid table’, INVPHONE.TBL as
shown here:

0000000000
1111111111
2222222222
3333333333
4444444444
5555555555
6666666666
7777777777
8888888888
9999999999
1234567
5551212
1111111

QualityStage Designer User Guide D-21


D MORE ABOUT USING RULES
Validation Rule Sets

2222222
3333333
4444444
5555555
6666666
7777777
8888888
9999999
0000000

If the data value fails any one of the validation requirements the
‘Invalid Data’ and the ‘Invalid Reason’ fields are populated.

Business Intelligence Output Fields


If the data value passes validation, this rule set outputs these
Business Intelligence field values:
• Valid Phone Number {VD}
• Phone Number Extension {VX}
• Valid flag {VF}
The {VD} field is populated with the numeric phone number, the {VX}
field with the extension number, and the {VF} field with the value ‘T’.

D-22 QualityStage Designer User Guide


MORE ABOUT USING RULES
Validation Rule Sets

Error Reporting Output Fields


If a data value fails to pass validation, this rule set outputs the
following Reporting Error fields:

Field Value Description


Invalid Data {ID} Invalid data The value that did not meet the
value validation requirements.
Invalid Reason {IR} IL Invalid length. Main phone
(without extension) must be 7 or 10
bytes.
IT Invalid value matched to invalid
table (INVPHONE.TBL) entry.
IP Invalid pattern/format. Main
phone (without extension) must be
10 bytes numeric. This logic can be
commented out if alphas are to be
considered valid values.

Note: 555-1212 is a directory assistance number within each area code


in the United States, and therefore would be an invalid phone
number for customers, suppliers or other entities.

Examples
The following table shows sample input data and the output they
produce:

Valid Data Valid Invalid Data Invalid


Input String {VD} Flag {VF} {ID} Reason {IR}
0001234567 0001234567 IT
(617) 338-0300 6173380300 T
617-338-0300 6173380300 T
0001234567 0001234567 IT

QualityStage Designer User Guide D-23


D MORE ABOUT USING RULES
Validation Rule Sets

VTAXID Rule Set


The VTAXID rule set validates the value and standardizes the format
of a tax ID or Social Security number. Note the following:
• Punctuation such as hyphens(-) are removed during the parsing
step.
• There are no significant default Classification table entries used
with this rule set.

Default Parsing Parameters


The default parsing parameters are:
SEPLIST <space character>, ; . % : & * \ " / + \ \ ( ) -
STRIPLIST <space character>, ; . : / * \ " \ \ ( ) -

Parsing Examples
The following table shows examples of how tax IDs and Social
Security numbers are parsed:

Input String Token Token Number


051-34-8198 051 1
34 2
8198 3
193837485 193837485 1

The hyphen, space, and parentheses are used to separate the data.
After the data is parsed the hyphen, spaces, and parentheses are
deleted.

D-24 QualityStage Designer User Guide


MORE ABOUT USING RULES
Validation Rule Sets

Validation Logic
The rule set validates patterns and values based on the following
criteria:
• The value has nine numeric characters.
• The first three bytes are not all zeros (000).
• The value is not listed on the Invalid table (INVTAXID.TBL) as
shown here:

000000000
111111111
222222222
333333333
444444444
555555555
666666666
777777777
888888888
999999999
123456789
987654321
111223333

Business Intelligence Output Fields


If a data value passes the listed criteria then it is considered a valid
value and outputs two Business Intelligence field values:
• TAX_ID/SSN valid data {VD}
• Valid flag {VF}
The {VD} field is populated with the nine numeric bytes, and the {VF}
field is populated with the value ‘T’.

QualityStage Designer User Guide D-25


D MORE ABOUT USING RULES
Validation Rule Sets

Error Reporting Output Fields


If the data value fails any one of the validation requirements the
Invalid Data and the Invalid Reason fields are populated.

Field Value Description


Invalid Data Invalid data The value that did not meet the
value validation requirements.
Invalid Reason IP The data value did not contain nine,
and only nine, numeric characters.
IT The data value was found on the
Invalid Data table.
Z3 The first three numeric characters are
all zeros.

Examples
The following table shows sample input data and the output they
produce:

Valid Data Valid Flag Invalid Data Invalid


Input String {VD} {VF} {ID} Reason {IR}
000123456 000123456 Z3
193837485 193837485 T
193-83-7485 193837485 T
111111111 111111111 IT
222-22-2222 222222222 IT
A12-O9-1234 A12O91234 IP

D-26 QualityStage Designer User Guide


E

Customizing and Testing Rule Sets

QualityStage standardization jobs use rule sets, which contain criteria


on how input data is processed.
The pre-built rule sets delivered with QualityStage are configured for
optimal performance for the Standardize and the Multinational
Standardize stages, as well as for the WAVES stage. However, you
can customize rule sets to suit your requirements.
QualityStage also provides you with a tool by which you can test most
of your rules before putting them into production.
This appendix describes how to use QualityStage tools to efficiently
customize rule sets and how to test them.

Important: We strongly advise that you read “Rule Set Files” on page C-1
before you try to customize rule sets.

Rule Set Customization


Using rule override tables, you can customize:
• Standardize rule sets, including:
• Domain Pre-Processor rule sets
• Domain-Specific rule sets
• Validation rule sets

QualityStage Designer User Guide E-1


E CUSTOMIZING AND TESTING RULE SETS
Rule Set Testing

These are described in Chapter 10, “Defining Standardize Stages”.


• WAVES/Multinational Standardize rule sets, which are
described in Chapter 11, “Defining Multinational Standardize
Stages”. Note that the address standardization process in
QualityStage WAVES uses the Multinational Standardize rule
sets.

Note: You can override only street-level data within the


WAVES/Multinational Standardize rule sets; you cannot
override area data such as city or region.

See “Using Override Tables to Customize Rule Sets” on page E-2 for
information on how to use rule override tables.
You can also make more complex modification to Standardize rule sets
using subroutines, which are described in “User Modification
Subroutines” on page E-40.

Rule Set Testing


You can test Standardize rule sets (including ones that have been
customized) using the Rules Analyzer. After customizing a business
rule set, you first should test it to ensure that it defines the data
re-engineering business rule that you need.
This feature cannot be used with WAVES/Multinational Standardize
rule sets.
See “Testing Standardization Rule Sets” on page E-34 for information
on how you use the Rules Analyzer.

Using Override Tables to Customize Rule Sets


Rule sets for the Standardize, Multinational Standardize, and
WAVES stages define the way QualityStage processes the input data.
These standardization processes use Classification tables and

E-2 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Override Tables to Customize Rule Sets

Pattern-Action files. You can change their content using override


tables.

The override tables:


• Let you specify your own custom conditioning rules, which are
stored separately from the Pattern-Action file and Classification
table.
• Are accessible in the QualityStage user interface; you do not need
to know pattern-action language nor do you need to open the
pattern-action file.
• Let you port custom modifications when upgrading to a new
version of QualityStage.
Each rule set contains five override tables, which are listed in the next
three sections. Note the following:
• The Standardize Pre-processor and Domain-specific override
tables have eight-character names, which are the rule set file
names with two characters appended indicating the override table
type.
• The Validation override tables use the rule set file names with two
characters appended indicating the override table type.
• The WAVES/Multinational Standardize override tables are eight
characters. The first two characters are the ISO country code
abbreviation, the second two are “MN” for multinational, the third
two are “AD” for address, and the last two indicate the override
table type.
• User Classification tables do not have a table-type abbreviation in
their names.
• All override tables have a .TBL extension except for User
Classification, which uses a .UCL extension.
• When you first install QualityStage, the override tables are empty;
they contain no data.

QualityStage Designer User Guide E-3


E CUSTOMIZING AND TESTING RULE SETS
Using Override Tables to Customize Rule Sets

Domain Pre-Processor Override Tables


The following chart describes the tables used to override the
Standardize Domain Pre-Processor rule sets.

Domain Pre-Processor Table Type Example (United States)


Override Table Names Abbreviation
User Classification N/A USPREP.UCL
Input Pattern IP USPREPIP.TBL
Input Text IT USPREPIT.TBL
Field Pattern FP USPREPFP.TBL
Field Text FT USPREPFT.TBL

Domain-Specific Override Tables


The following chart describes the tables used to override the
Standardize Domain -Specific rule sets:

Domain-Specific Table Type Example (United States


Override Table Names Abbreviation Address)
User Classification N/A USADDR.UCL
Input Pattern IP USADDRIP.TBL
Input Text IT USADDRIT.TBL
Unhandled Pattern UP USADDRUP.TBL
Unhandled Text UT USADDRUT.TBL

E-4 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Override Tables to Customize Rule Sets

Validation Override Tables


The following chart describes the tables used to override the
Standardize Validation rule sets:

Domain-Specific Table Type Example (United States


Override Table Names Abbreviation Address)
User Classification N/A VTAXID.UCL
Input Pattern IP VTAXIDIP.TBL
Input Text IT VTAXIDIT.TBL
Unhandled Pattern UP VTAXIDUP.TBL
Unhandled Text UT VTAXIDUT.TBL

QualityStage WAVES/Multinational Address Override Tables


The following chart describes the tables used to override the
WAVES/Multinational Standardize rule sets:

Multinational Address Table Type Example (United States)


Override Table Names Abbreviation
User Classification N/A USMNAD.UCL
Input Pattern IP USMNADIP.TBL
Input Text IT USMNADIT.TBL
Unhandled Pattern UP USMNADUP.TBL
Unhandled Text UT USMNADUT.TBL

Working with Multiple Projects


The information you enter in override tables is applied to rule sets
across all projects.

QualityStage Designer User Guide E-5


E CUSTOMIZING AND TESTING RULE SETS
Using Override Tables to Customize Rule Sets

When working with multiple projects that require different overrides,


it’s a good idea to first make a copy of the default override table you
wish to change, and then edit the copy. Using this method enables you
to create override tables that apply to a specific project.

Domain Pre-Processor Rule Set Process


The logical process flow of the Domain Pre-Processor Pattern-Action

E-6 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Override Tables to Customize Rule Sets

file is shown in the following diagram:

Pattern Action File (.PAT)


1 Classification
Input 1A
Overrides
Overrides

Input Text
Input Overrides 1B
2 Modifications
User Subroutine

Input Pattern
Overrides
1C
Continuation
3 User Subroutine
Modifications

Field Continuations
y MIXED
User Overrides
Main
4 Pattern
y OTHR
y AREA
Action y ADDR
(Part 1) y NAME

Field Text
Field Overrides 5A
5
Overrides

Field Pattern
Overrides
5B
6 Field
User Subroutine
Modifications

Field Typing
y MIXED
7 Main
y OTHR
Pattern y AREA
Action y ADDR
(Part 2) y NAME

QualityStage Designer User Guide E-7


E CUSTOMIZING AND TESTING RULE SETS
Using Override Tables to Customize Rule Sets

Domain-Specific, Validation, and WAVES/Multinational Address


Rule Set Process
The logical process flow of the Domain -Specific, Validation, and
WAVES/Multinational Standardize Pattern-Action files is shown in
the following diagram:

Pattern Action File (.PAT)


Input Classification
1 Overrides 1A
Overrides

Input Input Text


2 User Subroutine Overrides 1B
Modifications

Input Pattern
3 Common Overrides 1C
Patterns

User Overrides
Main Pattern
4 Action

Unhandled Text 5A
Overrides
Unhandled
5 Overrides
Unhandled
Pattern 5B
Overrides
Unhandled
6 User Subroutine
Modifications

E-8 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Using Overrides to Customize Rule Sets


Through the Override dialog box, you modify the classification, input
pattern, input text, field pattern, and field text override tables.
QualityStage opens the appropriate override dialog box depending on
the rule set (Domain Pre-Processor, Domain-Specific, Validation, or
WAVES/Multinational Standardize) you want to modify.
To open the Override dialog box:

1. Do one of the following:


• For Standardization overrides, select Rules ➤ Standardization
Overrides to display the list of the available rule sets:

Note: If the list of rules does not appear, select File ➤ Designer
Options from the QualityStage main window, and then

QualityStage Designer User Guide E-9


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

specify the path to the rules folder in the Standardize Process


Definition Directory field.

• For WAVES/Multinational Standardize overrides, select Rules


➤ WAVES/Multinational Standardization Overrides to
display the list of the available country-specific rule sets:

2. Select the rule set for which you want to specify an override.
The appropriate override dialog box appears. The dialog box tabs
let you choose the override table you wish to modify.

E-10 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Domain Pre-Processor Overrides


Domain Pre-Processor Overrides let you customize a domain
pre-processor rule set. This section discusses how to define the domain
pre-processor override options for the:
• User Classification table.
• Input Pattern and Field Pattern tables.
• Input Text and Field Text tables.

Adding Classification Overrides


You can override the classification table provided with a rule set using
the Classification user override. The Classification user override lets
you add and substitute token classifications.
For example, you would add an override to the user Classification
table to classify SSTREET as a street type and assign the standard
postal abbreviation of ST. If you ran, without any overrides, the U.S.
Domain Pre-processor (USPREP) rule set on the following record:
123 MAIN SSTREET
the data is parsed and classified as follows:

Data Token Classification Reason


123 ^ Numeric
MAIN + Unknown alpha
SSTREET + Unknown alpha It is not in the
Classification
table due to the
misspelling.

To add a classification override:

1. Select Rules ➤ Standardization Overrides and select a Domain


Pre-processor rule set such as USPREP to display the override
dialog box.

QualityStage Designer User Guide E-11


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

2. Select the Classification tab:

3. Under Input Token, enter the word token for which you want to
override the classification, such as SSTREET. Spell the word as it
appears in the input file.
4. Under Standard Form, enter the standardized spelling of the
token, such as ST.
5. From the Classification menu, select the one-character tag that
indicates the class of the token word, such as T-Street Types.

E-12 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

6. Specify a Comparison Threshold value, which defines the degree


of uncertainty that will be tolerated in the spelling of the token
word, or leave it blank.

Range of Comparison Description


Threshold Values
Blank or no value Token must match identically
700 High tolerance (minimum value)
750 and 800 Allows for one or two letter transformations
depending on the length of the word
(common threshold value)
950 Zero tolerance (maximum value)

For example, leave the Comparison Threshold text box blank to


specify that the token must match identically.

QualityStage Designer User Guide E-13


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

7. Click Add to add the override to the list box at the bottom of the
dialog box.

8. Click OK to save your edits and close the dialog box.


Next time you run the rule set, the word tokens for which you
overrode the user classifications are classified with the
designations you specified and appear with the appropriate
standard form. For example, when running USPREP, SSTREET is
classified as a Street Type and appears with the Standard Form of
ST.

E-14 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Note: To continue working with user overrides for the current rule set,
click Apply to save your edits without closing the dialog box.

To discard any changes, click Cancel to close the dialog box.

See “Modifying and Maintaining Overrides” on page E-32 to learn how


to add a new override based on an existing one.

Adding Input Pattern and Field Pattern Overrides


Input Pattern and Field Pattern user overrides for Domain
Pre-processor rule sets are used to modify the Input Pattern override
table (*IP.TBL) and the Field Pattern override table (*FP.TBL),
respectively.
Both Pattern dialog boxes contain the same fields and are used the
same way and neither allow partial pattern matching.
• The Input Pattern user override allows you to specify token
overrides based on the input pattern. These overrides take
precedence over the Pattern-Action file. Input Pattern overrides
can only be specified for the entire input pattern.
• The Field Pattern user override allows you to specify token
overrides based on one field pattern. These overrides can only be
specified for an entire field pattern.
For example, if you have an input pattern:
N^+TA+S^
You would add an override to the Input Pattern table to move the N^T
to the address domain and the A+S^ to the area domain.
To add a pattern override:
1. Select Rules ➤ Standardization Overrides and select a Domain
Pre-processor rule set such as USPREP to display the override
dialog box.

QualityStage Designer User Guide E-15


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

2. Click the Input Pattern tab.

3. From the Classification Legend list, select the first token, such as
N, and then click Append this code to current pattern.
This adds the token to the Enter Input Pattern text box. The token
also appears in the Current Pattern List with the default A
(Address Domain) Override Code.
4. For each token in the input pattern, repeat step 3, such as for
tokens ^, +, T, A, +, S, and ^.

E-16 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Note: You can also type the tokens directly in the Enter Input
Pattern text box. Using the list ensures that you enter valid
values.

5. You can leave the default domain setting for a token, or you can
change it. To change a token’s override code:
a. Select the Token in the Current Pattern List.
b. Select a domain from the Dictionary Fields list.
For example, for the pattern N^+TA+S^, the N^+T can keep the
default A–Address Domain. Change A+S^ to the R–Area Domain.

Tip: Alternatively, you can use the Current Pattern List and
select the Token and Override Code you want. For example,
select A from the list of tokens, and then select R from the list
of override codes.

QualityStage Designer User Guide E-17


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

6. Click Add to add the override to the Override Summary list.

7. Click OK to save your edits and close the dialog box.


Whenever you run the rule set on the pattern, it is tokenized as
you specified. For example, when running USPREP on the pattern
N^+TA+S^, the pattern N^+T is tokenized with the Address domain;
the pattern A+S^ is tokenized with the Area domain.

Note: To continue working with user overrides for the current rule set,
click Apply to save your edits without closing the dialog box.

To discard any changes, click Cancel to close the dialog box.

E-18 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Overriding the field pattern is done similarly, except that you click the
Field Pattern tab at Step 1 instead of the Input Pattern tab.
See “Modifying and Maintaining Overrides” on page E-32 to learn how
to add a new override based on an existing one.

Adding Input Text and Field Text Overrides


Input Text and Field Text user overrides for Domain Pre-processor
rule sets are used to modify the Input Text override table (*IT.TBL)
and the Field Text override table (*FT.TBL), respectively.
Both Text dialog boxes contain the same fields and are used the same
way.
• The Input Text override allows you to specify token overrides based
on all the input text. These overrides take precedence over the
Pattern-Action file. Since they are more specific, Input Text
overrides take precedence over Input Pattern overrides.
• The Field Text override allows you to specify token overrides based
on the input text string. Since they are more specific, Field Text
overrides also take precedence over Field Pattern overrides. Field
Text overrides can only be specified for an entire field text string.
Partial pattern matching within a field is not allowed.
For example, you would add an override to the Input Text table if you
wanted to change the Name domain in the following text to the
Address domain. ZQNAMEZQ is the name domain delimiter and
ZQADDRZQ is the address domain delimiter:
ZQNAMEZQ MARTIN LUTHER KING ZQADDRZQ BLVD
To add a text override:
1. Select Rules ➤ Standardization Overrides, and then select a
Domain Pre-processor rule set such as USPREP.

QualityStage Designer User Guide E-19


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

2. Click the Input Text tab.

3. Under Enter Input Tokens, enter the domain delimiter and the
text for which you want to override its tokens. For example, enter
ZQNAMEZQ MARTIN LUTHER KING ZQADDRZQ BLVD.
Each word you enter appears in the Current Token List with the
word itself in the Token column and the default domain, such as A
(Address Domain), in the Override Code column.
4. Select the Token for the current domain delimiter of the text you
want to override, and then select the override you want to apply
from the Dictionary Fields list.

E-20 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

For example, to process the example text as an address, we do not


need to change any of the Override Codes.
Repeat step 4 for the remaining words you need to change.
5. Click Add to add the override to the Override Summary list.

6. Click OK to save your edits and close the dialog box.


7. To continue working with user overrides for the current rule set,
click Apply to save your edits without closing the dialog box.

To discard any changes, click Cancel to close the dialog box.

QualityStage Designer User Guide E-21


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

When you run the customized rule set on the text, it is processed as
you specified. For example, when you run USPREP on the text
ZQNAMEZQ MARTIN LUTHER KING ZQADDRZQ BLVD, the entire text
string will be handled as an address.
Overriding the field text is done similarly, except that you click the
Field Text tab at Step 1 instead of the Input Text tab.
See “Modifying and Maintaining Overrides” on page E-32 to learn how
to add a new override based on an existing one.

Creating Domain-Specific, Validation, and WAVES/Multinational


Address Overrides
This section discusses how to define overrides options for the
Domain-Specific, Validation, and WAVES/Multinational Standardize
rule sets by modifying the:
• User Classification table
• Input Pattern and Unhandled Pattern tables
• Input Text and Unhandled Text tables

Adding Classification Overrides


You can override the classification table provided with a rule set using
the Classification user override. The Classification user override lets
you add and substitute token classifications.
The Classification user override for the Standardize and
WAVES/Multinational Standardize rule sets contain the same fields
(the tokens are different). You use the overrides the same way. See
“Adding Classification Overrides” on page E-11.

Adding Input Pattern and Unhandled Pattern Overrides


Input Pattern and Unhandled Pattern overrides for Domain-Specific
rule sets are used to modify the Input Pattern (*IP.TBL) and
Unhandled Pattern (*UP.TBL) override tables, respectively.

E-22 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Both Pattern dialog boxes contain the same fields and are used the
same way.
• The Input Pattern override allows you to specify rule overrides
based on the input pattern. Input Pattern overrides take
precedence over the pattern-action file. Input Pattern overrides
can only be specified for the entire input pattern. Partial pattern
matching is not allowed.
• The Unhandled Pattern override allows you to specify rule
overrides based on the unhandled pattern. Unhandled Pattern
overrides work on tokens not processed by the pattern-action file.
Unhandled pattern overrides can only be specified for the entire
unhandled pattern. Partial pattern matching is not allowed.
For example, you would override the Input Pattern table if you
wanted to designate the following pattern:
^+T
this way:

Pattern Token Type Value


^ House Number Original
+ Street Name Original
T Street Type Standard

To add a pattern override:


1. Do one of the following:
• Select Rules ➤ Standardization Overrides, and then select a
Domain-Specific or Validation rule set such as USADDR or
VTAXID.
• Select Rules ➤ WAVES/Multinational Standardization
Overrides Overrides, and then select a country-specific rule
set.

QualityStage Designer User Guide E-23


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

2. Click the Input Pattern tab:

3. From the Classification Legend list, select the first token, such as
^, and then click Append this code to current pattern.
This adds the token to the Enter Input Pattern text box. The token
also appears under the Current Pattern List with the default
override code AA1 (Additional Address Information and code 1).
4. Repeat step 3 for the tokens + and T.

E-24 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Note: You can also type the tokens directly in the Enter Input
Pattern text box. Using the list ensures that you enter valid
values.

5. Under Current Pattern List, select the first row ^ AA1.


6. From the Dictionary Fields list, select the token that represents
the type, for example HN-House Number.
The selected row in the Current Pattern List changes to ^HN1.
7. Repeat step 5 and step 6 for the remaining tokens. For example,
select SN - Street Number for +, and ST - Street Type for T.
8. Select the current pattern from the Current Pattern List, such as
T ST1, for the token for which you want to specify the override.
9. Select Standard Value.
The row T ST1 changes to T ST2, indicating that the standard value
from the Classification table will be used for this token in this
pattern. The rest of the tokens are left as the original value.

Note: Every selection or combination of selections under User


Override creates a code that appears next to the selected row
in the Current Value List.

For a list of action codes, see the “Action Codes for


Domain-Specific, Validation, and WAVES/Multinational
Standardize Rule Sets” on page E-33.

QualityStage Designer User Guide E-25


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

10. Click Add to add the override to the Override Summary list.

11. Click OK to save your edits and close the dialog box.
Whenever you run the rule set, the pattern for which you have
specified overrides will be processed accordingly. For example,
when you next run the USADDR rule set, the pattern ^+T is
handled accordingly.

Note: To continue working with user overrides for the current rule set,
click Apply to save your edits without closing the dialog box.

To discard any changes, click Cancel to close the dialog box.

E-26 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Overriding the unhandled pattern information is done similarly,


except that you click the Unhandled Pattern tab at Step 1 instead of
the Input Pattern tab.
See “Modifying and Maintaining Overrides” on page E-32 to learn how
to add a new override based on an existing one.

Adding Input Text and Unhandled Text Overrides


The Input Text and Unhandled Text overrides for Domain-Specific
and WAVES/Multinational Standardize rule sets are used to modify
the Input Text (*IT.TBL) and Unhandled Text (*UT.TBL) override
tables, respectively.
Both Text dialog boxes contain the same fields and are used the same
way.
• The Input Text user override allows you to specify rule overrides
based on the input text string. Input Text overrides take
precedence over the Pattern-Action file. Since they are more
specific, Input Text overrides also take precedence over input
pattern overrides. Input Text overrides can only be specified for the
entire input text string. Partial string matching is not allowed.
• The Unhandled Text override allows you to specify rule overrides
based on the unhandled text string. Unhandled Text overrides
work on tokens not processed by the Pattern-Action file. Since they
are more specific, Unhandled Text overrides take precedence over
Unhandled Pattern overrides. Unhandled Text overrides can be
specified only for the entire unhandled text string. Partial string
matching is not allowed.
For example, the address:
100 Summer Street Floor 15
contains two tokens that use standard values from the Classification
table. The remaining tokens are not associated with standard values

QualityStage Designer User Guide E-27


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

from the Classification table and use their original data values. The
following shows the tokenized address before you add any overrides:

Text Type Value


100 House Number Original
Summer Street Name Original
Street Street Type Standard
Floor Floor Type Standard
15 Floor Value Original

To add a text override:

1. Do one of the following:


• Select Rules ➤ Standardization Overrides, and then select a
Domain-Specific rule set such as USADDR or a Validation rule
set such as VTAXID.
• Select Rules ➤ WAVES/Multinational Standardization
Overrides, and then select a country-specific rule set.

E-28 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

2. Click the Input Text tab:

3. Under Input Text, enter the text string for which you want to
define a pattern override, such as 100 SUMMER STREET FLOOR 15.
Each text token appears in the Current Token List under the
Token column. Next to each token, the default code of AA
(Additional Address Information) plus action code 1 appears.
For a list of action codes, see the “Action Codes for
Domain-Specific, Validation, and WAVES/Multinational
Standardize Rule Sets” on page E-33.
4. Select the first text token, such as 100.

QualityStage Designer User Guide E-29


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

5. From the Dictionary Fields list, select the code you want, such as
HN - House Number.
The AA1 next to 100 in the Current Token List changes to HN1.
6. Repeat step 4 and step 5 for each of the remaining text tokens, for
example:

Text Type
Summer SN - Street Name
Street ST - Street Suffix Type
Floor FT - Floor Type
15 FV - Floor Value

7. Select the text token in the Current Token List, such as STREET
and then select Standard Value.
The row STREET ST1 changes to STREET ST2, indicating that the
standard value from the Classification table will be used for this
token in this pattern. The rest of the tokens are left as the original
value.
8. Repeat step 7 for each text token you wish to standardize. For
example, repeat step 7 for text token FLOOR.

Note: Every selection or combination of selections under User


Override creates a code that appears next to the selected row
in the Current Value List.

E-30 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

9. Click Add to add the override to the Override Summary list.

10. Click OK to save your edits and close the dialog box.
Whenever you run the rule set on the text string, it is processed as you
specified. For example, when you next run the USADDR rule set, the
text string 100 SUMMER STREET FLOOR 15 is handled accordingly.

Note: To continue working with user overrides for the current rule set,
click Apply to save your edits without closing the dialog box.

To discard any changes, click Cancel to close the dialog box.

QualityStage Designer User Guide E-31


E CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Overriding the unhandled text information is done similarly, except


that you click the Unhandled Text tab at Step 1 instead of the Input
Text tab.
See “Modifying and Maintaining Overrides” on page E-32 to learn how
to add a new override based on an existing one.

Modifying and Maintaining Overrides


The Override Summary list at the bottom of each override screen lets
you easily maintain overrides. By selecting an override from the list,
you can delete it, modify it, or create a new user override based on it.

Deleting Overrides
To delete overrides:

1. Select the overrides to delete.


• To select a single override, click anywhere within its row to
select it.
• To select a group of consecutive overrides, click within the first
override’s line. Next, Shift+Click on the last override’s line.
• To select a group of nonconsecutive overrides, Ctrl+Click
within the line of each override you want to include.
2. Click Delete to remove the selected overrides from the list.

Modifying Overrides
To modify an existing override:

1. Select the override you want to modify.


2. Click Edit to temporarily move that override’s values to the
respective editing areas in the upper part of the screen.
3. Modify the values as needed.
4. Click Add to move the override back to the Override Summary list.

E-32 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Using Overrides to Customize Rule Sets

Creating a Override Similar to an Existing One


To create a override similar to an existing one:

1. Select the rule you to want to duplicate.


2. Click Copy to copy that override’s values to the appropriate
editing areas in the upper part of the screen.
3. Modify the values to create a new override.
4. Click Add to add the override to the Override Summary list.

Action Codes for Domain-Specific, Validation, and


WAVES/Multinational Standardize Rule Sets
The following table describes the actions that will be performed by the
Pattern-Action file for a given code setting.
The codes are displayed in the Override Code box of the Current
Token List box in the Domain-Specific Text and Pattern User
Override dialog boxes. You adjust the code settings by making
selections in the User Override box within the dialog boxes.

Value Associated Actions


0 (zero) Drop the current token
1 Append a leading character space and then append the original
value of the current token to the specified data type.
2 Append a leading character space and then append the standard
value of the current token to the specified data type.
3 Append the original value of the current token, without
appending a leading character space, to the specified data type.
4 Append the standard value of the current token, without
appending a leading character space, to the specified data type.
5 Move all remaining tokens using their original values, leaving
one character space between each token, to the specified data
type.

QualityStage Designer User Guide E-33


E CUSTOMIZING AND TESTING RULE SETS
Testing Standardization Rule Sets

Value Associated Actions


6 Move all remaining tokens using their standard values, leaving
one character space between each token, to the specified data
type.
7 Move all remaining tokens to the specified data type, using their
original values. Do not leave a character space between each
token.
8 Move all remaining tokens to the specified data type, using their
standard values. Do not leave a character space between each
token.

Testing Standardization Rule Sets


QualityStage’s Rule Analyzer lets you quickly test a selected
Standardize rule set against a one-line test string (a single record)
before running the rule set against an entire file. This time-saving
option is especially helpful when you plan to use a large input file.
Testing the rule set ensures that the resulting datafile will be what
you want.

Note: This feature is not available for WAVES/Multinational


Standardize rule sets.

To test a rule set:

1. Choose Rules ➤ Standardization Rule Analyzer to open the Rule


Analyzer dialog box for a specific rule set.
Depending on whether the rule set you select is a Domain
Pre-processor, Domain-Specific, or Validation rule set, the Rule
Analyzer dialog box contains different types of input fields.
2. Enter the input strings:
• For a Domain Pre-processor rule set, you can enter up to six
input strings and their respective delimiters.
• For a Domain-Specific or Validation rule set, you can enter
only one input string—delimiters don’t apply.

E-34 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Testing Standardization Rule Sets

The following sections discuss the specifics of opening the


Standardization Rule Analyzer dialog box, testing a Domain
Pre-processor rule set, and testing a Domain-Specific or Validation
rule set.

Opening the Standardization Rules Analyzer


To open the Standardization Rules Analyzer dialog box for a specific
rule set:

1. Select Rules ➤ Standardization Rules Analyzer:

2. Select the rule set you want to test.

QualityStage Designer User Guide E-35


E CUSTOMIZING AND TESTING RULE SETS
Testing Standardization Rule Sets

The associated Standardization Rules Analyzer dialog box


appears:

The title bar and Rule Set box display the name of the current rule set
you are testing.
At any time you can select another rule set to test from the Rule Set
list, which lists all available rule sets. If you select a different type of
rule set (Domain Pre-processor rather than Domain-Specific or
Validation), the screen resets itself to reflect the correct type. Any
data under Input String is maintained, but the results grid at the
bottom of the screen is cleared.
The Standardization Rules Analyzer supports international rules. If
no Locale is specified, the Standardization Rules Analyzer assumes
you want to use the default locale to which your computer is set. By
specifying a different locale, you can run data against rule sets that
are not designed for the default locale.

E-36 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Testing Standardization Rule Sets

Testing a Domain Pre-Processor Rule Set


If you select a Domain Pre-Processor rule set (name ends with PREP)
to test, a dialog box like the following appears.

The Standardization Rules Analyzer first populates each Input String


box’s history. Each individual row in a pre-processor rule set test
screen has its own history of up to five previously entered input
strings. QualityStage maintains a separate history log for each rule
set.
You can enter up to six strings of data to test. If you enter more than
one input string, the only requirement is that they be in the right
order and that the delimiter box next to each input string be set. It
doesn’t matter if you leave any blank rows in between.

Important: You must set the delimiter for each input string; you cannot leave
it at [None]. If you attempt to run the test without specifying a

QualityStage Designer User Guide E-37


E CUSTOMIZING AND TESTING RULE SETS
Testing Standardization Rule Sets

delimiter for each string input, a message box appears to point


out the error.

The input string for pre-processor rule sets is a concatenation of all


user-inputted strings, separated by the delimiter chosen for each
string. So the input file contains one long line created by a
concatenated string of input strings and delimiters.
To test a Domain Pre-Processor rule set:

1. If needed, change the Locale for the rule set.


2. Enter the input strings you want to test, or select a previously
entered string from the input string’s menu.
3. Select a Delimiter for each input string you entered.
4. Click Test this String to start the testing.
If the test finishes successfully, the results appear in the bottom
grid, listing each token and its fields and field descriptions.
5. To abort a test in progress, click Cancel to restore the screen to its
previous state.
6. To remove all data from all input boxes, reset the delimiters to
[None], and click Clear Data.
You can then test the same rule set using a different set of input
strings, or you can use the Rule Set list to select a different rule
set to test.
When you are finished testing, click Exit.

E-38 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
Testing Standardization Rule Sets

Testing a Domain-Specific Rule or Validation Rule Set


If you select a Domain-Specific or Validation rule set to test, a dialog
box like the following appears:

When you access this screen, the Rule Analyzer first populates the
Input String box’s history. QualityStage maintains a separate history
log of up to five previously tested input strings for each rule set.
For Domain-Specific or Validation rule sets, you can enter only one
Input String to test.
To test a Domain-Specific or Validation rule set:

1. If needed, change the Locale for the rule set.


2. Enter the input string you want to test, or select a previously
entered string from the input string’s menu.
3. Click Test this String to start testing.

QualityStage Designer User Guide E-39


E CUSTOMIZING AND TESTING RULE SETS
User Modification Subroutines

If the test finishes successfully, the results appear in the bottom


grid, listing each token and its fields and field descriptions.
4. To abort a test in progress, click Cancel to restore the screen to its
previous state.
5. To remove the data from the input box and clear the results grid,
click Clear Data.
You can then test the same rule set using a different input string,
or you can use the Rule Set list to select a different rule set to test.
When you finish testing, click Exit.

User Modification Subroutines


Occasionally, more complex rule set modifications that require the use
of pattern-action language may be necessary. Each Standardize rule
set has subroutines reserved for these modifications.
You cannot use subroutines to modify WAVES/Multinational
Standardize rule sets.

Subroutine Limitations
Subroutine modifications are not portable to the next upgrade of
QualityStage. Also, the syntax must be correctly written, or
unpredictable output may result. For these reasons, we strongly
advise that you use the Standardization Overrides to control
standardization output.

Country Identifier User Subroutines


Pattern-action statements added to the input modifications
subroutine are performed before any other pattern-actions.
Modifications can be added here if you have determined that certain
conditions are completely mishandled or unhandled by the rule set.

E-40 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
User Modification Subroutines

The input subroutine section of the Pattern-Action file can be found at


the beginning of the subroutine section or by searching for:
;--------------------------------------------------
;Input_Modifications SUBROUTINE Starts Here
;--------------------------------------------------

\SUB input_Modifications

Domain Pre-Processor User Subroutines


Subroutines exist in three places within the Pattern-Action file for a
Domain Pre-processor rule set:
• Input Modifications
• Continuation Modifications
• Field Modifications

Input Modifications
Pattern-Action statements added to the input modifications
subroutine are performed before any other pattern-actions.
Modifications should be added here if you have determined that
certain conditions are completely mishandled or unhandled by the
rule set.
The subroutine section of the Pattern-Action file is delimited by a
header, as shown here:
;--------------------------------------------------
;Input_Modifications SUBROUTINE Starts Here
;--------------------------------------------------
\SUB Input_Modifications

Continuation Modifications
The logical flow of the Domain Pre-Processor begins with the isolation
of each contiguous pair of delimited input fields to search for the

QualityStage Designer User Guide E-41


E CUSTOMIZING AND TESTING RULE SETS
User Modification Subroutines

possibility of data continued across fields or split across field


boundaries. Pattern-action statements added to the continuation
modifications subroutine are performed before any other continuation
pattern-actions.
The input subroutine section of the Pattern-Action file can be found at
the beginning of the subroutine section or by searching for:
;--------------------------------------------------
;Continuation_Modifications SUBROUTINE Starts Here
;--------------------------------------------------
\SUB Continuation_Modifications

Field Modifications
The second step in the Domain Pre-Processor logical flow is to isolate
each delimited input field, one at a time, to search for common domain
patterns.
Pattern-action statements added to the field modifications subroutine
are performed before any other field pattern-actions.
The input subroutine section of the Pattern-action file can be found at
the beginning of the subroutine section or by searching for:
;--------------------------------------------------
;Field_Modifications SUBROUTINE Starts Here
;--------------------------------------------------
\SUB Field_Modifications

Domain-Specific User Subroutines


Subroutines exist in two places within a Domain-Specific
Pattern-action file:
• Input Modifications
• Unhandled Modifications
Modifications should be added here if you have determined that
certain conditions are wholly or partially mishandled or unhandled by
the rule set.

E-42 QualityStage Designer User Guide


CUSTOMIZING AND TESTING RULE SETS
User Modification Subroutines

Input Modifications
Pattern-actions added to the input modifications subroutine are
performed before any other pattern-actions.
The input modification subroutine can be found in the Pattern-Action
file at the beginning of the subroutine section or by searching for:
;--------------------------------------------------
; Input_Modifications SUBROUTINE Starts Here
;--------------------------------------------------
\SUB Input_Modifications

Unhandled Modifications
Pattern-actions added to the unhandled modifications subroutine are
performed after all other pattern-actions.
The unhandled modification subroutine can be found in the
Pattern-Action file at the beginning of the subroutine section or by
searching for:
;--------------------------------------------------
; Unhandled_Modifications SUBROUTINE Starts Here
;--------------------------------------------------
\SUB Unhandled_Modifications

QualityStage Designer User Guide E-43


E CUSTOMIZING AND TESTING RULE SETS
User Modification Subroutines

E-44 QualityStage Designer User Guide


F

ISO Country Codes

The following table list the 2- and 3-character ISO country codes:

Two- Three-
Country Character Character

AFGHANISTAN AF AFG

ALBANIA AL ALB

ALGERIA DZ DZA

AMERICAN SAMOA AS ASM

ANDORRA AD AND

ANGOLA AO AGO

ANGUILLA AI AIA

ANTARCTICA AQ ATA

ANTIGUA AND BARBUDA AG ATG

ARGENTINA AR ARG

ARMENIA AM ARM

ARUBA AW ABW

QualityStage Designer User Guide F-1


F ISO COUNTRY CODES

Two- Three-
Country Character Character

AUSTRALIA AU AUS

AUSTRIA AT AUT

AZERBAIJAN AZ AZE

BAHAMAS BS BHS

BAHRAIN BH BHR

BANGLADESH BD BGD

BARBADOS BB BRB

BELARUS BY BLR

BELGIUM BE BEL

BELIZE BZ BLZ

BENIN BJ BEN

BERMUDA BM BMU

BHUTAN BT BTN

BOLIVIA BO BOL

BOSNIA AND HERZEGOWINA BA BIH

BOTSWANA BW BWA

BOUVET ISLAND BV BVT

BRAZIL BR BRA

BRITISH INDIAN OCEAN TERRITORY IO IOT

BRUNEI DARUSSALAM BN BRN

BULGARIA BG BGR

BURKINA FASO BF BFA

F-2 QualityStage Designer User Guide


ISO COUNTRY CODES

Two- Three-
Country Character Character

BURUNDI BI BDI

CAMBODIA KH KHM

CAMEROON CM CMR

CANADA CA CAN

CAPE VERDE CV CPV

CAYMAN ISLANDS KY CYM

CENTRAL AFRICAN REPUBLIC CF CAF

CHAD TD TCD

CHILE CL CHL

CHINA CN CHN

CHRISTMAS ISLAND CX CXR

COCOS (KEELING) ISLANDS CC CCK

COLOMBIA CO COL

COMOROS KM COM

CONGO CG COG

CONGO, THE DEMOCRATIC REPUBLIC CD COD


OF THE

COOK ISLANDS CK COK

COSTA RICA CR CRI

COTE D'IVOIRE CI CIV

CROATIA (local name: Hrvatska) HR HRV

CUBA CU CUB

QualityStage Designer User Guide F-3


F ISO COUNTRY CODES

Two- Three-
Country Character Character

CYPRUS CY CYP

CZECH REPUBLIC CZ CZE

DENMARK DK DNK

DJIBOUTI DJ DJI

DOMINICA DM DMA

DOMINICAN REPUBLIC DO DOM

EAST TIMOR TP TMP

ECUADOR EC ECU

EGYPT EG EGY

EL SALVADOR SV SLV

EQUATORIAL GUINEA GQ GNQ

ERITREA ER ERI

ESTONIA EE EST

ETHIOPIA ET ETH

FALKLAND ISLANDS (MALVINAS) FK FLK

FAROE ISLANDS FO FRO

FIJI FJ FJI

FINLAND FI FIN

FRANCE FR FRA

FRANCE, METROPOLITAN FX FXX

FRENCH GUIANA GF GUF

FRENCH POLYNESIA PF PYF

F-4 QualityStage Designer User Guide


ISO COUNTRY CODES

Two- Three-
Country Character Character

FRENCH SOUTHERN TERRITORIES TF ATF

GABON GA GAB

GAMBIA GM GMB

GEORGIA GE GEO

GERMANY DE DEU

GHANA GH GHA

GIBRALTAR GI GIB

GREECE GR GRC

GREENLAND GL GRL

GRENADA GD GRD

GUADELOUPE GP GLP

GUAM GU GUM

GUATEMALA GT GTM

GUINEA GN GIN

GUINEA-BISSAU GW GNB

GUYANA GY GUY

HAITI HT HTI

HEARD AND MC DONALD ISLANDS HM HMD

HOLY SEE (VATICAN CITY STATE) VA VAT

HONDURAS HN HND

HONG KONG HK HKG

HUNGARY HU HUN

QualityStage Designer User Guide F-5


F ISO COUNTRY CODES

Two- Three-
Country Character Character

ICELAND IS ISL

INDIA IN IND

INDONESIA ID IDN

IRAN (ISLAMIC REPUBLIC OF) IR IRN

IRAQ IQ IRQ

IRELAND IE IRL

ISRAEL IL ISR

ITALY IT ITA

JAMAICA JM JAM

JAPAN JP JPN

JORDAN JO JOR

KAZAKHSTAN KZ KAZ

KENYA KE KEN

KIRIBATI KI KIR

KOREA, DEMOCRATIC PEOPLE'S KP PRK


REPUBLIC OF

KOREA, REPUBLIC OF KR KOR

KUWAIT KW KWT

KYRGYZSTAN KG KGZ

LAO PEOPLE'S DEMOCRATIC REPUBLIC LA LAO

LATVIA LV LVA

LEBANON LB LBN

F-6 QualityStage Designer User Guide


ISO COUNTRY CODES

Two- Three-
Country Character Character

LESOTHO LS LSO

LIBERIA LR LBR

LIBYAN ARAB JAMAHIRIYA LY LBY

LIECHTENSTEIN LI LIE

LITHUANIA LT LTU

LUXEMBOURG LU LUX

MACAU MO MAC

MACEDONIA, THE FORMER YUGOSLAV MK MKD


REPUBLIC OF

MADAGASCAR MG MDG

MALAWI MW MWI

MALAYSIA MY MYS

MALDIVES MV MDV

MALI ML MLI

MALTA MT MLT

MARSHALL ISLANDS MH MHL

MARTINIQUE MQ MTQ

MAURITANIA MR MRT

MAURITIUS MU MUS

MAYOTTE YT MYT

MEXICO MX MEX

MICRONESIA, FEDERATED STATES OF FM FSM

QualityStage Designer User Guide F-7


F ISO COUNTRY CODES

Two- Three-
Country Character Character

MOLDOVA, REPUBLIC OF MD MDA

MONACO MC MCO

MONGOLIA MN MNG

MONTSERRAT MS MSR

MOROCCO MA MAR

MOZAMBIQUE MZ MOZ

MYANMAR MM MMR

NAMIBIA NA NAM

NAURU NR NRU

NEPAL NP NPL

NETHERLANDS NL NLD

NETHERLANDS ANTILLES AN ANT

NEW CALEDONIA NC NCL

NEW ZEALAND NZ NZL

NICARAGUA NI NIC

NIGER NE NER

NIGERIA NG NGA

NIUE NU NIU

NORFOLK ISLAND NF NFK

NORTHERN MARIANA ISLANDS MP MNP

NORWAY NO NOR

OMAN OM OMN

F-8 QualityStage Designer User Guide


ISO COUNTRY CODES

Two- Three-
Country Character Character

PAKISTAN PK PAK

PALAU PW PLW

PANAMA PA PAN

PAPUA NEW GUINEA PG PNG

PARAGUAY PY PRY

PERU PE PER

PHILIPPINES PH PHL

PITCAIRN PN PCN

POLAND PL POL

PORTUGAL PT PRT

PUERTO RICO PR PRI

QATAR QA QAT

REUNION RE REU

ROMANIA RO ROM

RUSSIAN FEDERATION RU RUS

RWANDA RW RWA

SAINT KITTS AND NEVIS KN KNA

SAINT LUCIA LC LCA

SAINT VINCENT AND THE VC VCT


GRENADINES

SAMOA WS WSM

SAN MARINO SM SMR

QualityStage Designer User Guide F-9


F ISO COUNTRY CODES

Two- Three-
Country Character Character

SAO TOME AND PRINCIPE ST STP

SAUDI ARABIA SA SAU

SENEGAL SN SEN

SEYCHELLES SC SYC

SIERRA LEONE SL SLE

SINGAPORE SG SGP

SLOVAKIA (Slovak Republic) SK SVK

SLOVENIA SI SVN

SOLOMON ISLANDS SB SLB

SOMALIA SO SOM

SOUTH AFRICA ZA ZAF

SOUTH GEORGIA AND THE SOUTH GS SGS


SANDWICH ISLANDS

SPAIN ES ESP

SRI LANKA LK LKA

ST. HELENA SH SHN

ST. PIERRE AND MIQUELON PM SPM

SUDAN SD SDN

SURINAME SR SUR

SVALBARD AND JAN MAYEN ISLANDS SJ SJM

SWAZILAND SZ SWZ

SWEDEN SE SWE

F-10 QualityStage Designer User Guide


ISO COUNTRY CODES

Two- Three-
Country Character Character

SWITZERLAND CH CHE

SYRIAN ARAB REPUBLIC SY SYR

TAIWAN, PROVINCE OF CHINA TW TWN

TAJIKISTAN TJ TJK

TANZANIA, UNITED REPUBLIC OF TZ TZA

THAILAND TH THA

TOGO TG TGO

TOKELAU TK TKL

TONGA TO TON

TRINIDAD AND TOBAGO TT TTO

TUNISIA TN TUN

TURKEY TR TUR

TURKMENISTAN TM TKM

TURKS AND CAICOS ISLANDS TC TCA

TUVALU TV TUV

UGANDA UG UGA

UKRAINE UA UKR

UNITED ARAB EMIRATES AE ARE

UNITED KINGDOM GB GBR

UNITED STATES US USA

UNITED STATES MINOR OUTLYING UM UMI


ISLANDS

QualityStage Designer User Guide F-11


F ISO COUNTRY CODES

Two- Three-
Country Character Character

URUGUAY UY URY

UZBEKISTAN UZ UZB

VANUATU VU VUT

VENEZUELA VE VEN

VIET NAM VN VNM

VIRGIN ISLANDS (BRITISH) VG VGB

VIRGIN ISLANDS (U.S.) VI VIR

WALLIS AND FUTUNA ISLANDS WF WLF

WESTERN SAHARA EH ESH

YEMEN YE YEM

YUGOSLAVIA YU YUG

F-12 QualityStage Designer User Guide


G

Sharing Dictionary Fields and


Variable Names Across Rule Sets

The majority of Standardize rule sets are designed to be an


independent process. This independence means that the rule set
processes data using only the logic contained within its Pattern-Action
file and reads and writes data only to fields listed in its own
Dictionary File.
However, certain standardization exercises require the use of multiple
Standardize rule sets to correctly process input data. Sometimes these
multiple rule sets must interact with each other in terms of crossing
process boundaries and referencing values populated in the dictionary
file fields of other rule sets.
In earlier versions of QualityStage, Standardize rule sets were
allowed to cross process boundaries. Starting with INTEGRITY 3.6,
special pattern-action language syntax is required in order for a rule
set to reference values populated in the dictionary file fields of another
rule set. This new syntax is called scoping.

Scoping
Scoping allows for a precise, reliable way to guarantee that the correct
dictionary field is referenced. Scoping also allows for the new

QualityStage Designer User Guide G-1


G SHARING DICTIONARY FIELDS AND VARIABLE NAMES ACROSS RULE SETS
Scoping For Dictionary Field Names

functionality of referencing variable names in the Pattern-Action files


of other rule sets.
Scoping introduces new pattern-action language syntax to the way
that dictionary fields and variable names are referenced.
Scopes are either local or global:
• Local scopes refer only to the current rule set and do not cross
process boundaries.
• Global scopes cross process boundaries to refer to another rule set.

Log File Warning Messages


The Standardize stage writes warning messages to the log file
whenever it encounters a dictionary field or variable name scope that
it can not resolve.
Unresolved scopes are local or global references that are made to
dictionary fields or variable names that can not be found in any rule
set in the current job.
The pattern-action sets associated with the unresolved scopes are
ignored so that no data is lost.
Unresolved scopes do not terminate an executing Standardize stage.

Scoping For Dictionary Field Names


Dictionary fields are referenced in pattern-action language as
two-byte field names enclosed in braces.
Dictionary fields can have either a local or a global scope:
{XX} or {XX OF USAREA}
{XX} can refer only to the XX dictionary field in the current rule set.
{XX OF USAREA} can refer only to the XX dictionary field in the
USAREA rule set.

G-2 QualityStage Designer User Guide


SHARING DICTIONARY FIELDS AND VARIABLE NAMES ACROSS RULE SETS
Scoping For Dictionary Field Names

For example, if the following pattern-action set appeared in


USADDR.PAT, it would move the token to the XX field in USADDR.DCT:
&
COPY [1] {XX}
And, if the following pattern-action set appeared in USADDR.PAT, it
would move the token to the XX field in USAREA.DCT:
&
COPY [1] {XX OF USAREA}
Therefore, the scope of a dictionary field reference is considered to be
local to the current rule set unless a global scope including the rule set
name is specified.

Backward Compatibility for Dictionary Field Name Scopes


Backward compatibility is maintained because the local scope is the
exact same syntax that has always been used in the Standardize
stage. Therefore, this change will not make previous working rules
fail.
However, if a shared situation is required in a previous rule set, the
Pattern-Action file of the rule set must be modified to use a global
scope reference.

Modifying a Previous Rule Set


If you have a previous rule set that references a dictionary field that
does not appear in the rule set’s own Dictionary File, you must add a
global scope to all references to that field in the rule set’s
Pattern-Action file.

QualityStage Designer User Guide G-3


G SHARING DICTIONARY FIELDS AND VARIABLE NAMES ACROSS RULE SETS
Scoping for Variable Names

Example
In previous versions of QualityStage, the GEOCODE rule set’s
Pattern-Action file referenced the state abbreviation field ({SA}) in the
Dictionary File of the PLACE rule set:
[ {SA} = “PR” ]
In order for the above pattern to test true, it must be modified to
include a global scope reference to the PLACE rule set:
[ {SA of PLACE} = “PR” ]
All other references of {SA} in the GEOCODE Pattern-Action file must
be changed to {SA of PLACE}. If the global scopes are not added, the
executing Standardize stage will not terminate. The patterns will not
test true, and therefore the associated actions are not performed.

Scoping for Variable Names


Variable names are referenced in pattern-action language as an
alphanumeric name of up to 32 characters where the first character is
always alphabetic. Variables can have either a local or a global scope:
temp or temp<USAREA>
temp can refer only to the temp variable in the current rule set.
temp<USAREA> can refer only to the temp variable in the USAREA
rule set.

Example
If the following pattern-action set appeared in USADDR.PAT, it would
move the token to the USADDR temp variable:
&
COPY [1] temp
And, if the following pattern-action set appeared in USADDR.PAT, it
would move the token to the USAREA temp variable:
&
COPY [1] temp<USAREA>

G-4 QualityStage Designer User Guide


SHARING DICTIONARY FIELDS AND VARIABLE NAMES ACROSS RULE SETS
Scoping for Variable Names

Therefore, the scope of a variable reference is considered to be local to


the current rule set unless a global scope including the rule set name
is specified.

Backward Compatibility for Variable Name Scopes


Backward compatibility is maintained because the local scope is the
exact same syntax that has always been used in the Standardize
stage. Therefore, this change will not make previous working rules
fail.
Since the sharing of variable names was not previously allowed in the
Standardize stage, you do not need to worry about updating any
previous rule sets with global scope references for variable names.

Important: Rule sets using global scope references for either dictionary fields
or variable names are not compatible with versions of
INTEGRITY before release 3.6.

QualityStage Designer User Guide G-5


G SHARING DICTIONARY FIELDS AND VARIABLE NAMES ACROSS RULE SETS
Scoping for Variable Names

G-6 QualityStage Designer User Guide


H

Using AuditStage with


QualityStage

QualityStage XE licenses include a copy of AuditStage, a software tool


that lets you apply quality control methods to manage the accuracy,
consistency, completeness, and integrity of data.
On its own, AuditStage provides a powerful means of examining and
analyzing your data before and after standardization and matching in
QualityStage.
You can complement the relative power of both tools by using specific
AuditStage and QualityStage functions in combination. This
Appendix describes how to use those functions.
It is assumed that you are familiar with how AuditStage works. To
learn more, consult the documentation set provided with your
software, which includes:
• AuditStage User’s Guide
• AuditStage Installation Guide
• AuditStage Methodology and Application Guide

QualityStage Designer User Guide H-1


H USING AUDITSTAGE WITH QUALITYSTAGE
Source File Aggregation

Source File Aggregation


AuditStage lets you aggregate data from sources such as DB2 and
other relational database tables and bring them into QualityStage as
delimited text files or fixed-field flat files.
You can use AuditStage’s ODBC connection and filtering capabilities
to extract and join data from any number of relational database tables
into a single table. AuditStage’s export function lets you export this
aggregated data table as a delimited text file that contains all header
information.

Note: We recommend that you use a delimited text file format when
exchanging files between AuditStage and QualityStage.

Overview of Building a Source File from Data Tables


You perform the following basic steps in AuditStage:

1. Access your source data from AuditStage by making an ODBC


connection to your database (see Connecting to an External
Database in Chapter 4, Projects and Databases in the AuditStage
User’s Guide).
2. Use the Data Filters function to define a filter that includes the
columns from the tables you want to include in your QualityStage
source file. Select Data Filters from the AuditStage menu bar, and
then create a new data filter.
See Chapter 8, Data Filters and Chapter 9, Types of Data Filter
Checks for information on how to create data filters.
3. After you save and run your filter, use Tables ➤ Export ➤
General... from the menu bar to select the table and columns you
want to export. Information on exporting tables is in Chapter 18,
Table Management.

Important: By default, AuditStage exports the file as a comma-delimited .csv


file. Before using this file in your Standardize or Match job, use
the Format Convert stage to convert it to a flat file.

H-2 QualityStage Designer User Guide


USING AUDITSTAGE WITH QUALITYSTAGE
Pre-Standardization Validation

Alternatively, you can export the file directly as a flat file by


appending a .txt extension to the file name in the Export Filename
dialog box.

Exporting a Sample Source File


If you want to analyze a sample of the source file, you can create one
for exporting to QualityStage as described in “Sampling the Results
Data” on page H-5.

Pre-Standardization Validation
Before running a Standardize stage in QualityStage, you can use
AuditStage to complement the QualityStage Investigate stage.
Identifying problem areas in the data before standardizing it ensures
that the results will be more meaningful and thus enhance the
productivity of your QualityStage work session.

Validating at the Row Level


Using AuditStage, you can facilitate the process of identifying missing
or erroneous domain values at the row level.
AuditStage, using rules you define, identifies whether the value is
valid or not. For example, if an indicator of Y or N is defined as valid
in a column, any other value is regarded as an error. AuditStage
identifies these erroneous values and counts the frequency that each
error occurs throughout the file.
For more complex analysis, you create business rules in AuditStage to
verify that selected fields have values that are valid in relation to each
other within the row. For example, you can verify that the values
under the Region column correspond correctly to values in a State
Code column. If a row contains the Region code NE (Northeast), a
State Code of TX (Texas) in the same row would be invalid per your
business rule.

QualityStage Designer User Guide H-3


H USING AUDITSTAGE WITH QUALITYSTAGE
Tuning Standardization and Matching Jobs

Using Your AuditStage Results in QualityStage


After collecting the information, you can export the AuditStage
valid/invalid tables and merge them into their corresponding
QualityStage lookup or classification tables used in the
Standardization rule sets. This capability enables business users to
generate accurate business rules for the data they will be
standardizing.
Refer to the AuditStage Methodology and Application Guide for a
description of the methodologies of Domain Analysis, and
Completeness and Validity Assessment.

Tuning Standardization and Matching Jobs


During the development phase of your QualityStage project, you can
use AuditStage’s methodology to test and report on the output of your
Standardize and Match jobs as your refine them.
In addition, you can streamline testing by using AuditStage’s
sampling capabilities to create a statistically valid data subset of the
entire data set, thus speeding the development process without
compromising accuracy.

Testing the Results Data


Using AuditStage’s filtering capabilities, you can test the results of
your QualityStage jobs.
For standardization, you can test for:
• Unhandled patterns and text
• Numbers of invalid values detected
• Completeness
For matching, you can use multiple field duplicate detection testing
and compare the results against QualityStage’s Match test methods.

H-4 QualityStage Designer User Guide


USING AUDITSTAGE WITH QUALITYSTAGE
Tuning Standardization and Matching Jobs

Accessing QualityStage Data for Testing


To access your QualityStage data from AuditStage, select Tables ➤
Attach ➤Delimited Text File... from the menu bar, and then locate the
results file you want to test.

Sampling the Results Data


After establishing an initial baseline performance level using the full
results data, you can significantly shorten development time by using
sample data from your results file.
AuditStage’s sophisticated sampling capabilities let you create
statistically valid subsets of your source data. These samples serve as
fair representations of the whole. When you fine-tune your business
rules using these samples, you can achieve reliable results without
time and resource cost of testing against a large and time-consuming
data file.

Creating and Using a Sample Data Set


AuditStage enables you to quickly take a valid statistical sample of
your data. To create and use a sample set of data:

1. Use the Sample Clause function to create your sample file. See
Chapter 9, Types of Data Filter Checks in the AuditStage User’s
Guide for more information.
2. When defining the sample, do one of the following:
a. For standardization, create a random sample of records of
your QualityStage results file.
b. For matching, use a random sample of files based upon unique
Match Set IDs, rather than rows, of your results file. Extract
all the records with the same Match Set IDs for the sample.
3. Create a data filter for your sample QualityStage source file that
defines the columns you want to test.
Based on the results of your sample tests, you can modify your rule
sets to better meet the requirements of your source data.

QualityStage Designer User Guide H-5


H USING AUDITSTAGE WITH QUALITYSTAGE
Maintaining Your QualityStage Jobs

Note: Read Appendix B, Statistical Sampling in the AuditStage


Methodology and Application Guide. See the section, Applying
Statistical Sampling, for specific information about sample sizes
and confidence intervals.

Maintaining Your QualityStage Jobs


After you put your QualityStage jobs into full production mode, you
probably want to periodically test them to ensure they still meet your
initial benchmark metrics.
You can use the testing methodologies and sampling techniques from
your development work to monitor current performance. To automate
the process, you can define scripts that use the AuditStage filters.
Ongoing testing of both the input and output data of your
QualityStage projects ensures that quality issues are detected and
corrected early, thus avoiding the risk of a major project
reconfiguration.

H-6 QualityStage Designer User Guide


Index

A B
Abbreviate stage 6-2 blocking 12-5
adding specifying 12-27
jobs 6-4 Build stage 6-3
projects 4-2 business names, standardizing with the
stages to jobs 6-5, 6-6 Standardize stage 10-30, D-1
stages to user-defined jobs 6-9
add-on modules, using 4-28
Advanced Options dialog box C
for Character mode 9-10 Character Discrete 15-15
for Word mode 9-22 Character mode
Append Field Selection dialog box 10-20 about 9-5
Arrayfields dialog box 4-27 about Pattern reports 9-5
arrays Concatenate option 9-7
adding fields to 4-27 creating job 9-9
assigning missing values 4-27 Discrete option 9-6
assigning special treatment 12-40 Character Mode dialog box 9-9
defining 4-27 classification
using with the Match stage 12-36 in Pattern-Action file C-9
AuditStage with rule sets C-9
aggregating data sources H-2 Classification Table
documentation set H-1 special classes C-8
job maintenance H-6 threshold weights C-7
Pre-standardization validation H-3 COBOL Copybook, importing 4-9
sampling data H-5 Collapse stage 6-2
tuning standardization and Command Definition dialog box 10-15
matching H-4 copying projects 4-3

QualityStage Designer User Guide I-1


INDEX

Country Identifier data files 4-14


delimiter 10-31 fields 4-15
rule set 10-30, D-1 jobs 6-5
creating deploying
datafile definitions 4-16 creating run profiles 7-5
IMF files A-5 local Windows server 7-11
jobs 6-4 OS/390 server 7-9
projects 4-1 UNIX server 7-10
queries 15-5 Windows server 7-10
run profiles 5-1 design a report 15-4
specifications 15-8 design view 15-6
tables 15-4 Designer options
customer support xxxii default import directory 3-12, 3-13
cutoffs location of Standardize rule set
assigning 12-34 files 3-12
assigning for Match pass 12-34 setting 3-12, 3-13, 3-14
defining 12-8 Standardize process definition
directory 3-13
working directory 3-12, 3-13
D dialog boxes, using 3-10
Data Field wizard 4-26 Dictionary File C-2
Data File and Report Viewer 16-1 format C-3
data files documentation conventions xxx
adding 4-16 Domain Pre-Processor rule sets 10-5, D-3
copying 4-18 creating overrides E-9, E-11
copying definition to create 4-18 testing E-37
creating definition 4-16 Domain-Specific rule sets 10-9, D-10
defining 4-14 creating overrides E-22
defining fields 4-15, 4-26 testing E-39
deleting fields 4-25
modifying fields 4-24
required format 4-15 E
viewing 16-1 error messages A-11
data stream mode exporting projects 4-4
definition 7-3 extract statements
staging 7-7 MOVE space 13-27
debug file, Match 13-28 MOVE-FieldName 13-25
default import directory 3-13 MOVE-Literal 13-23
defining MOVE-LR 13-26
arrays 4-27 MOVE-Variable 13-24

I-2 QualityStage Designer User Guide


INDEX

extracts flat file 15-10


about 12-9 Format Convert stage 6-3
creating statements 13-22
customizing 13-15
defaults 13-14 G
defining customized 13-16 groups, about 14-3
generating for Match 13-13
maintaining statements 13-23
specifying record layout 13-20 H
statements and arguments 13-23 HIGH statement for Match report 13-9
types 13-16 histogram 13-31
viewing 16-1
I
F IMF A-1
fields creating an IMF file A-5
assigning special treatement 12-40 fatal error messages A-11
defining 4-15, 4-26 importing
deleting 4-25 COBOL Copybook 4-9
modifying 4-24 projects 4-8
file definition setting default directory 3-12
modifying properties 4-19 input data location
file format local Windows 7-11
input files 4-15 OS/390 7-9
file mode UNIX 7-10
definition 7-3 Windows 7-10
deploying 7-8 Investigate stage
File Mode Execution dialog box 7-8, 8-7, about 9-2
9-27, 10-28, 11-14, 12-46, 14-22 about Character Concatenate
file types option 9-7
a.FRQ 9-15 about Character Discrete option 9-6
c.FRQ 9-15 about field mask 9-7
n.DLT 9-16 advanced options for Word mode 9-18
p.FRQ 9-5, 9-13 character type 9-7
p.SRT 9-5, 9-13 creating Character mode job 9-9
u.DLT 9-16 creating jobs 9-3
files creating Word mode job 9-21
Dictionary File C-2 Pattern reports for Character mode 9-5
Pattern-Action C-8, C-11 Pattern reports for Word mode 9-13
Rule Set Description C-14 rule sets 9-12

QualityStage Designer User Guide I-3


INDEX

using Character mode 9-5 L


using Word mode 9-11 literals 10-15
Word Classification reports 9-16 using with the Standardize stage 10-11
Word Frequency reports 9-14 log files A-11
Investigation word report 15-16 lookup tables C-14
LOW statement for Match report 13-9
J
jcl_cnv command A-7 M
Job Run Options dialog box 12-43 managing stages 6-5
for Match 12-43 Mask Field Selection dialog box 9-10
for Survive 14-19 masks 9-7
jobs Match
adding 6-4 about matching 12-6
adding operations to 6-5 about weights 12-7
adding stages to 6-6, 6-9 assigning cutoffs 12-34
creating 6-4 creating statements for extract record
customizing 6-3 layout 13-22
debugging 7-8 customizing reports 13-3
defining 6-5 defining custom report 13-4
defining fields 4-15 defining output file for report 13-3
defining stages 6-8 extracts 13-13
deploying 7-1, 7-5 generating reports 13-2
file mode 7-8 maintaining statements for extract
modifying stages 6-9 record layout 13-23
output files 6-7 report extract statements and
output files location 8-13 arguments 13-23
removing stages 6-6 reports 13-2
reordering stages 6-6 setting job parameters 12-48
results file 4-16 specifying report layout 13-7
running 7-1, 8-1–8-13 specifying weight overrides 12-36
running in file mode 7-8 statistics report 13-28
running on local Windows server 8-2 unduplicating 12-8
running on remote servers 8-2 using arrays 12-36
setting server environment 5-1 using default reports 13-2
stages 6-2 using Reverse Matching 12-35
user-defined 6-3 Match comparison
using 6-2 ABS_DIFF B-1
AN_DINT B-2
AN_INTERVAL B-3

I-4 QualityStage Designer User Guide


INDEX

CHAR B-4 adding job 12-12


CNT_DIFF B-5 assigning cutoffs 12-34
D_INT B-6 customizing extracts 13-15
D_USPS B-7 data files 12-10
DATE8 B-8 defining a pass 12-27
DELTA_PERCENT B-10 defining custom extracts 13-16
DISTANCE B-11 defining input files 12-10
INT_TO_INT B-12 defining output files 12-10
INTERVAL_NOPAR B-13 defining output files for extracts 13-15
INTERVAL_PARITY B-14 defining procedure 12-11
LR_CHAR B-15 defining Vartypes 12-40
LR_UNCERT B-16 histogram 13-31
MULT_EXACT B-18 matching types 12-11
MULT_RANGE B-18 specifying blocking fields 12-27
MULT_UNCERT B-19 specifying extract record layout 13-20
NAME_UNCERT B-20 specifying matching fields 12-30
NUMERIC B-21 specifying m-probabiltiy 12-35
PREFIX B-22 specifying u-probabiltiy 12-35
PRORATED B-22 using default extracts 13-14
TIME B-24 Match Stage wizard 12-13
UNCERT B-25 Match Summary Report 15-31
USPS B-26 Match Unduplication Summary
USPS_DINT B-27 Report 15-31
USPS_INT B-30 Match Wizard - Blocking Variables dialog
Match Extract Stage wizard 13-17 box 12-28
match grouping 15-30 Match Wizard - Match Pass dialog
Match Grouping Summary Report 15-30 box 12-31
Match Histogram Report 15-30 Match Wizard - Match Specifications dialog
Match Output Review Report 15-30 box 12-26
Match pass matching
defining 12-27 arrays 12-36
Match Report Specification dialog box 13-8 assigning cutoffs 12-34
Match Report Stage wizard 13-4 blocking phase 12-5
Match reports 13-1–13-32 reverse matching 12-35
Match stage reverse with Match 12-35
about 12-3 specifying blocking 12-27
about blocking 12-5 specifying fields 12-30
about cutoffs 12-8 types for Match stage 12-11
about m-probabililty 12-7 with the Match stage 12-6
about u-probability 12-7 metadata delimiters 10-15

QualityStage Designer User Guide I-5


INDEX

microsoft access database 15-10 defining data qualifiers 5-9


missing values, assigning to arrays 4-27 defining run profile 5-4
modifying execution class 5-7
file definition properties 4-19 input data location 7-9
stages 6-8 output class, defining for OS/390 5-7
stages in user-defined jobs 6-9 output files from jobs 6-7
MOVE statement for Match report 13-10 override tables C-14, C-15
MOVEALL statement for Match Multinational Standardize stage C-15
extract 13-28 overrides E-9
MOVE-FieldName arguments 13-25 creating for domain pre-processor rule
MOVE-Literal arguments 13-23 sets E-9, E-11
MOVE-LR arguments 13-26 creating for domain-specific rule
MOVELR statement for Match sets E-22
report 13-13 creating from existing E-33
MOVE-Variable arguments 13-24 deleting E-32
moving dialog box items 3-11 modifying E-32
m-probability
defining 12-7
specifying 12-35 P
Multinational Standardize stage 10-24, Parse stage 6-2
11-1 pass
output fields 11-15 defining a Match pass 12-27
override tables C-15 pattern matching, with rule sets C-8
overrides E-5 pattern rules
rule sets C-15 in Pattern-Action file C-8
using 10-24 Pattern-Action file C-8, C-11
actions C-11
classification C-9
N pattern matching C-8
names, standardizing with the Standardize patterns C-11
stage 10-30, D-1 tokens C-9
navigating QualityStage main window 3-4 patterns
reports 9-13
rule sets C-11
O preparing data for QualityStage 2-10
ODBC 15-10 process definition directory 3-13
options Profile Definition dialog box
client settings 3-14 for local Windows server 5-17
OS/390 server 5-7 for OS/390 server 5-5
adding application definitions 5-9 for UNIX server 5-11

I-6 QualityStage Designer User Guide


INDEX

for Windows server 5-11 specifying the data location 15-10


Program stage 6-3 test and debug 15-7, 15-8
Project Profile dialog box QualityStage WAVES
for OS/390 server 5-8 overrides E-5
project settings 5-14 Query wizard 15-5
projects
adding 4-2
copying 4-3 R
creating 4-1 Re-engineering workflow 2-1
exporting 4-4 Related documentation xxviii
importing 4-8 report extract statements
using 4-1 MOVE space 13-27
MOVE-FieldName 13-25
MOVE-Literal 13-23
Q MOVE-LR 13-26
QualityStage MOVE-Variable 13-24
introduction 1-2 Report Viewer 16-1
navigating 3-4, 3-10 accessing 16-2
preparing data for 2-10 dialog box 16-3
prerequisites xxviii troubleshooting 16-6
starting QualityStage Designer 3-2 Report wizard 15-6
using 3-1–3-16 reports
using dialog boxes 3-10 customizing for Match 13-3
using main window 3-4 defining Match customized 13-4
QualityStage Reports generating for Match 13-2
create custom reports 15-2, 15-4, 15-5, Match defaults 13-2
15-6 specifying layout for Match 13-7
flat files 15-11 types for Match 13-4
how reports work 15-2 viewing 16-1
how to generate and view 15-2 working with 13-1–13-32
microsoft access database 15-11 results file
ODBC 15-11 defining fields 4-16
predefined 15-14 Rule Analyzer
predefined descriptions 15-15, 15-16, about E-34
15-20, 15-21, 15-22, 15-23, 15-24, accessing E-35
15-25, 15-26, 15-27, 15-28, 15-30, testing Domain Pre-Processor rule
15-31 sets E-37
reports database 15-14 testing Domain-Specific rule sets E-39
running and viewing 15-12, 15-13 Rule Set Description file C-14
select the report location 15-12 rule sets 10-2

QualityStage Designer User Guide I-7


INDEX

about E-1 defining for OS/390 server 5-4


accessing Rule Analyzer E-35 defining for UNIX server 5-10
actions C-11 defining for Windows server 5-10
alternate directory C-15 OS/390 5-4
classification C-9 Windows 5-16
Classification Table special classes C-8 running
Classification Table threshold jobs 8-1–8-13
weights C-7 Standardize jobs 10-24
creating overrides E-9, E-11, E-22
creating overrides from existing
ones E-33 S
debugging E-34, E-35 saved entries 15-13
deleting overrides E-32 select and run 15-12
Domain Pre-Processor 10-4, 10-5, D-3 Select stage 6-2
Domain Pre-Processor process E-2, E-6 selecting dialog box items 3-11
Domain-Specific 10-4 servers
Domain-Specific process E-8 creating run profiles 5-1
interdependence 10-29 data files on 5-1
location C-15 deploying jobs 5-1
modifying overrides E-32 running jobs on 7-1
Multinational Standardize stage C-15 setting file structure 5-1
override tables C-15 setting Designer options 3-14
pattern matching C-8 Sort stage 6-2
patterns C-11 stages
processing flow 10-29 Abbreviate 6-2
testing E-34 adding to jobs 6-6, 6-9
Domain Pre-Processor E-37 adding to user-defined jobs 6-9
Domain-Specific E-39 Build 6-3
user modification subroutines E-40 Collapse 6-2
user subroutines E-40 customizing 6-3
using 9-12 defining 6-8
using overrides E-9 for jobs 6-2
rules Format Convert 6-3
creating for Survive stage 14-23 managing 6-4, 6-5
format for Survive stage 14-24 modifying 6-8
operators for Survive stage 14-25 modifying in user-defined jobs 6-9
processing for Survive stage 14-26 Parse 6-2
syntax examples 14-27 Program 6-3
run profiles 5-1 removing from user-defined jobs 6-6
defining for local Windows server 5-16 Select 6-2

I-8 QualityStage Designer User Guide


INDEX

Sort 6-2 rule sets 10-2


Transfer 6-2 setting location of rule set files 3-12
Unijoin 6-2 standardizing names 10-30, D-1
using 6-2 using literals 10-11
Standardization Appended CC Area Standardize Stage wizard 10-14
Report 15-24, 15-25, 15-26 standardizing
Standardization CC Address Report 15-24 names, using the Standardize
Standardization CC Appended Address stage 10-30, D-1
Report 15-25 starting QualityStage Designer 3-2
Standardization CC Appended Name statements
Report 15-23 arguments for Match report
Standardization CC Appended Prep extracts 13-23
Report 15-26 creating for Match report
Standardization CC Appended extracts 13-22
Report 15-20 maintaining for Match report
Standardization CC Appended Report with extracts 13-23
Prep 15-22 statistics report
Standardization CC Appended Summary Match 13-28
Report 15-26, 15-27 specifying for Match 12-48
Standardization CC Area Report 15-23 subroutines
Standardization CC Name Report 15-22 user rule set overrides E-40
Standardization CC Prep Report 15-25 subroutines, user
Standardization CC Report with country identifier E-40
Prep 15-21 Domain Pre-Processor E-41
Standardization CC Summary Domain-Specific E-42
Report 15-26 modification of rule sets E-40
Standardization WAVES/Multinational Survive stage
Appended Report 15-28 about 14-2
Standardization WAVES/Multinational creating jobs 14-4
Report 15-28 creating rules 14-23
Standardize process definition defining data files 14-4
directory 3-13 grouping records 14-3
Standardize stage processing rules 14-26
about 10-2 rule operators 14-25
about resuts 10-10 rules format 14-24
creating job 10-13 Survive Stage wizard 14-6
data files 10-11
defining input file 10-11
defining results file 10-12 T
description of output 10-10 tables

QualityStage Designer User Guide I-9


INDEX

lookup C-14 V
override C-14, C-15 Validation rule sets 10-10
threshold weights Vartype dialog box 12-40
Classification Table C-7 Vartypes, defining 12-40
tokens
in Pattern-Action file C-9
with rule sets C-9 W
Transfer stage 6-2 Weight Override dialog box 12-38
troubleshooting weights
Report Viewer 16-6 calculating with Match 12-7
specifying overrides for Match 12-36
Windows server
U advanced project settings 5-14
unduplicating defining directories for 5-14
using Match 12-8 defining local Windows run profile 5-16
Unijoin stage 6-2 defining run profile 5-10
UNIX server location of data directory 5-14
advanced project settings 5-14 Word mode
defining directories 5-14 about 9-11
defining run profile 5-10 about Pattern reports 9-13
input data location 7-10 about rule sets 9-12
location of data directory 5-14 about Word Classification reports 9-16
u-probability about Word Frequency reports 9-14
defining 12-7 creating job 9-21
specifying 12-35 setting advanced options 9-18
US Standardization Before/After 15-20 Word Mode dialog box 9-21
US Standardization with PREP Summary word report 15-16
Report 15-27 workflow 2-1
user overrides E-9 working client directory 3-13
using working with Match reports 13-1–13-32
add-on modules 4-28 working with QualityStage reports 15-1–
dialog boxes 3-10 15-32
additional menus 3-11
browsing 3-12
moving items 3-11
selecting items 3-11
Multinational Standardize stage 10-24
projects 4-1
QualityStage 3-1–3-16
QualityStage main window 3-4

I-10 QualityStage Designer User Guide

You might also like