We just realized that we've mispelled two words in each of the Inpatient/Outpatient set. Our Schema has it incorrectly as well and so are each individual FY sql files. With any text editor, one can replace them simply with ease. With vim, it is :
Inpatient: AveCoverCharge --> AvgCoverCharge
Outpatient: AveEstSubCharge --> AvgEstSubCharge
Unfortunately, there are also 15k rows of data which are blank in the below Dataset. My python code did not remove them after parsing the excel files. oh well !
This is a custom version for our project on research purposes.
If you wish to use the following as an example of a mysql DB for school or pet project, We think that is fine.
Maybe send us an email informing us on how you are using the data for your purposes. Be sure to give us credit, if you desire to use our package.
We may improve this process in the future. Fixed the cosc880.sql file to include DB creation instead of leaving it for user to create it.
The README.txt is now accurate. The top most file is what we plan to use for our research, the seperated files below are for anyone who wished to pick and choose but DB schema is a must have item.
Without it, Data set will not going to work. You can make up your own DB schema, but you would need to alter our DB schema and alter every single line in each of the sql files since we've decided early on specifically to include each of the DB fields in each of the data files.
The Outpatient data set has roughly 120k to 150k rows of data. Together Inpatient and Outpatient data-set, its roughly 1 million records from 2011 to 2014.
You will need 7zip to unzip the sql files, and either mysql or mariaDB engine to power the files. We recommend that you should install all of the required software first then find a way to install the DB schema and data from CMS.gov from below.
Just the Inpatient data is almost 800k rows of data. The ID should be unique, there is no need to install them in order but it might be best to install each sql file in order. If you can install out of order, Good for you ! email=--aschenbach--AT--gmail--com
DB SchemaDB Schema
Revised version.Medicare Provider fully loaded sha1sum
There is a large data from CMS.gov where we are thinking of parsing this large set of data to generate sql files with DB schema as a side project. this large dataset is similar to the above dataset but with additional fields. it is related to drug and drug costs for each US states which is based on medicare providers during years 2013,2014,2015. These dataset will total around 12 gigs of text for 3 years worth of data with SQL statements enclosed. Anaconda Statistical Package
There is also tools available for you to install on windose OS since installing many of these tools are far easier on linux platform. bitnami