by Haibin Shu Haibin Shu No Comments

How to Protect Data Integrity?

Just as integrity is an important characteristic of a human being, data integrity lays the foundation for valid analyses and reports. The integrity of data includes the following important components:

  • Accuracy: truly reflects source records and free of transcription errors. It mirrors the presence of honesty and truthfulness.
  • Consistency: free of logical errors, consistency across domains, visits, and devices etc. (e.g., collecting the same measurement for multiple times during the course of study) Consistency between data entries and data extraction such as data types, formats, conventions etc. This requirement of consistency corresponds to the personality of dependability and accountability.
  • Security: defines roles, access levels, scopes, and activities etc. It’s important to allow the right role to perform the right tasks and prevent the other way around. This talks about the characteristics of self-control and self-discipline.
  • Traceability: who did what at what time, and/or why. Just like in reality, keeping a track-record is essential for data integrity.

How to protect data integrity?

Protecting data integrity requires commitment of 4 Ps.

  • Platform: a secured system that is Part 11 compliant and adequate to provide good front end for data entries, discrepancy management, and other end-user interfaces. It also provides a strong back end to allow programmers to design/implement a database with required edit checks, metrics reports and other important/study specific functions in an efficient and effective way.
  • Process: a rigorous process specifies roles, functions, workflows, team-work, and responsibilities etc. It should include change control management.
  • People: having committed people is the key in the whole picture of protecting data integrity. Paying attention to details, being sensitive to any potential deviations that might compromise data integrity and taking proactive steps before any possible mistakes might take place demonstrate the needed commitment. Timely and ongoing training efforts would help promote more people to become committed to data integrity.
  • Passion: protecting data integrity requires corporate-level consensus and collaborative efforts.
by Haibin Shu Haibin Shu No Comments

Left-hand Programing vs. Right-hand Programming?

Double programming is a gold standard in a team of statistical programming and analysis. The necessity of doing so, is to ensure that the data is being processed correctly and the analysis is being conducted correctly following the pre-specified requirements such as the SAP document. In the end, the beauty of the practice is that same results are achieved by two or more different approaches. Often times the approaches are independent from each other and might even begin with divergent understanding of particular analysis methods. Ultimately, accuracy will be achieved when differences are reconciled and critical understandings converge, just like the team-work between left-hand and right-hand!

Generally speaking, the initial programmer has certain advantages such as choosing/using naming conventions, setting up output layout formats, and applying statistical procedures etc. The QC programmer would generally focus on checking the accuracy of the content of the output and analysis results. Plus, they would follow the variety of conventions that has been set up by the initial programmer.

Some factors to consider when facilitating a strong collaboration between left-hand programming and right-hand programming are:

  • A win-win culture: approaches are independent but the goal is the same. Commonly, the final goal is to prevent any mistakes in any programming and analysis. The initial programmer should always self-check first for initial quality assurance before handing over to the QC programmer for independent reviews.
  • Avoid cosmetic over-do: both parties should stick to simple, common, and effective conventions. Making too many extremely detailed formatting efforts could result in extra time and more difficulty in reconciliation of non-essential content, e.g., concatenating variables by special characters and calculated spaces.
  • Having constant contact: both parties should talk to each other constantly in order to generate high productivity. Changes are usually inevitable. For example, analysis methods may change multiple times in the course of a study before the SAP finalization. Both parties may start revising the respective programs in parallel (not sequentially) had they all be informed of the change requests in the same time.
by Haibin Shu Haibin Shu No Comments

Does One Clinical System Fit All Studies?

The answer is ‘No’ because it’s very hard to build or find a system that can fit into the needs of all clinical trials. Therefore selecting the most sufficient-effective system would become very important prior to launching the upcoming clinical trials.

In theory one would think it might be possible to include all factors when creating such a system, in reality however, the efforts might be too much to be justified. Let alone the ever evolving status of clinical trial requirements – e.g. the best paper-based system in history would fail to address the basic needs of a simple study nowadays that requires electronic data capture.

Some factors to consider when selecting a clinical system:

  • Experiences of site users: less experienced users may require much more robust and less error prone systems to work with for data entries and other necessary tasks.
  • Study design including visit structures, key data points etc.: the more complex a study is the more programming requirements it might need to implement such as edit checks, metric reports etc. So it may require a system that provides strong programming capabilities in the backend.
  • SAS extracts: many systems are equipped with on-demand-type data extracts which make it very convenient to extract real-time data for necessary reporting and analyses. Contrarily for systems that don’t provide such capability it might become a daunting task to get SAS data sets out of the system and nothing could be done until SAS data sets can be generated.
  • Balance of front-end and back-end: a system can easily be voted down if the front end functions aren’t as desirable; on the other hand attentions should be paid to the back end functions as well. Selecting a system with friendly front pages but insufficient back end functions equals driving a nice looking car without a good engine!

by Haibin Shu Haibin Shu No Comments

How to Setup Dropbox as an Effective Programming Environment?

Dropbox provides a cost-effective, secured, and sharing environment for SAS programmers. It’s quick to set up and easy to operate. Furthermore, it provides a server-like platform for programmers to develop, share, and execute codes.

A Cost-Effective Environment

Dropbox is a cost-effective web-based application that provides file storage and file management functions that are integrated with local systems. It provides solutions without heavy IT spending by reducing procurement of hardware and software, and maintenance needs.

Quick Set-up and Easy Operation

  1. Download and install Dropbox desktop application
  2. Share the Dropbox network path, e.g.

3.  Map the above network path to a common drive, e.g. Y:


Synchronization can be done selectively –

As long as the same letter is used to map Dropbox and the same study folders structure is used both programs and data sets can be invoked from any computers with SAS.

by Melissa Melissa No Comments

How To Ensure Your Topline Results Are Correct?

How to ensure your topline results are correct has become more challenging when the analysis is prepared in a CDISC environment. This is a good thing since CDISC provides a standard and uniform framework for clinical data sharing and reviewing. However, many data manipulations are often involved in creating these CDISC data sets which inevitably leads to a natural but critical question: how to make sure all these intermediate steps that deal with data changes/formatting won’t unintentionally introduce any errors/bias into the analysis results?

Approach 1. Double Programming to Make Sure CDISC Conversions Are Accurate

This might be the common and conventional way to alleviate the concern and ensure quality of analysis results. Of course this means extra time and resources. In particular, double programming for CDISC data sets then comparing and reconciling all differences might take a lot of time and require a good process and team in place to accomplish. More importantly the analysis result itself still has to be verified following the completion of this CDISC double programming process.

Approach 2. Raw Data Approach

This is the approach to verify the analysis results directly from the raw data sets. First of all it can be performed in parallel to the CDISC approach since it doesn’t depend on CDISC conventions – this would potentially save BIG on turn-around time. It is also an entirely independent process since it is independent from the CDISC conventions. Moreover, it not only verifies the analysis results but also verifies CDSIC conversions.  

Topline results usually requires a fast turnaround for good business causes. The raw data-based approach is worthy of consideration due to its efficiency and independency.