Home

How to subscribe to the
American Civil War
Research Database


Frequently asked questions

Database Status

About the American Civil War Research Database


The study and review of the American Civil War has generally focused on the senior military officers and major battles and not on an analytical look at the War from the perspective of the individual soldier. Using information from each soldier's military and civilian experiences to build a database from "ground up" rather than "top down", Historical Data Systems has created the only database of its kind that can be used for statistical and analytical examinations of the War. It is now possible to examine and measure the impact these individual soldier experiences had upon regimental effectiveness.

The American Civil War Research Database is a relational database. This means that there are numerous files (i.e. roster records, pension index records, GAR records, etc. ) which are "related" to each other. With HDS' Database the "relationship" or connection between these multiple files is the soldier's name. These files contain information gathered from the different sources used as discussed below. (See the bibliography section for a data source list.) Not every soldier is in every file, nor is the type of information about a soldier in a particular file the same for every other soldier in that same file. Also, the pace at which we at HDS can enter the information about each soldier will vary significantly based on the type of source documents available to us. Therefore, the Database is not a complete biographical story on each soldier, but rather a source of significant war and post war events that a soldier experienced that can be used for the study of the Civil War.

Building a soldier's story is a continuous process that requires information from varied and diverse sources; sources that have often been generally unavailable or inaccessible to the public. By utilizing relational database technology HDS can continue to expand the Database as the source information becomes available to us. Our primary source of information comes from the State Rosters as published by the Adjutants' General of each state. In addition to this primary source, we have begun to enter information for Roll of Honor soldiers, soldiers awarded the Medal of Honor, soldiers or family members who filed for pensions (pension record index information), 1860 census town summary information, 1890 enumeration of Civil War veterans or widows, regimental histories, etc. Data entry for these files is slower than for our primary sources since the data entry process is more complex.

HDS has made a major investment of time, material, and money in building the Database. With over 100,000 hours already invested, the Database contains the records of over 4 1/4 million soldiers. We continue to add more soldiers and more information to the Database. We have released the Database now to the Web because we believe that it has reached a critical mass where significant research and study can be started. A greater appreciation of the expense of the War can be made by conducting statistical examination of data for the work completed and extrapolating many of the results.

 

THE HDS CHALLENGES

In building a "soldier-centric" database, Historical Data Systems is presented with four major challenges: gathering the data, completeness of data, quality of data, and merging multiple data sources into a single database.

There is no central repository for American Civil War soldier information. Thus, our first challenge is locating the data. There are three key steps to this process: locating the appropriate information, gaining accessibility to the data, and entering the data into the Database. The majority of the source documents used in building the Database are century old books or microfilm images of original documents. After locating and gaining access to the data the question of how to enter the information into the Database arises. The age and fragility of the source documents, as well as the font type and size makes scanning technology ineffective for data entering. Therefore, the data is manually keyed into the Database and then verified for accuracy. This is the most time consuming and costly part of building the Database.

The second challenge, data completeness, refers to the type and amount of information available for each soldier. Our primary source of information for each soldier is the states' official records as published by the Adjutants General for each state. These are typically referred to as the "State Rosters" and contain information on every soldier from the state as well as brief regimental histories (which have been included as part of the Database). The inconsistency of information published by each state in their "State Rosters" has led to variations of what is available for each soldier in the Database. By utilizing multiple data sources as well as information supplied by our subscribers, we are able to fill in many of the "holes" in a soldier's war and post war record. A detailed checklist of information available by state is included in the "Database Status" section of the site. As can be seen, Massachusetts records are among the most complete but unfortunately are atypical of how most states detailed the service records of their citizens.

The next challenge is database integrity. Before releasing the Database to the web site, each record goes through a series of data edits. Many of the source records contain printing, spelling, date, or factual errors or are incomplete. For example, with inter and intra regimental transfers, names may be recorded differently (the Database contains over 18,000 unique first names!) or aliases used creating the appearance of multiple soldiers (aliases were not uncommon.). Also, records were sometimes lost.

Utilizing sophisticated edit routines, many but not all of these errors are caught. For example, we can determine a military record is wrong if it lists a soldier as being wounded at the battle of Gettysburg in July, 1862 (rather then in 1863) or if he appears to have mustered out before mustering in. In many instances we can follow a soldier when transferred even if there is a name change. Likewise, we catch many of the name variations for a soldier (and list them) that appear throughout the various data sources used to build the Database.

The fourth Database challenge is the merging together of the various files of information. The "key relationship" between these files is the soldier's name. However, this can be difficult. For example, G. Washington Smith (as in his roster record) may have been awarded the Medal of Honor for bravery but yet in the Medal of Honor file there isn't a G. Washington Smith but there is a G. W. Smithe, in the Roll of Honor file there is a G.W. Smythe and in the pension record index file there is a George W. Smyth. HDS developed data checking routines to catch many of these and other data issues to reduce the occurrence of duplication and questionable data. However, even with these edit routines, HDS can not guaranty that the Database is error free. We welcome our subscribers' help in rectifying any errors or supplying us with information they may have about family members or others.


Copyright 1997 - 2016 - Historical Data Systems, Inc.
P.O. Box 35
Duxbury, MA 02331
(800) 244-3446
(781) 934-1353
EMail HDS at civilwardata@sprynet.com