Data Mining – THATCamp DC 2017 http://dc2017.thatcamp.org Making History Tue, 04 Apr 2017 16:57:54 +0000 en-US hourly 1 https://wordpress.org/?v=4.9.12 http://dc2017.thatcamp.org/files/2017/02/PROV_1617_2_THATCamp_Univ-calendarDrupal-Promo_220x220_v2_Option-2-150x150.jpg Data Mining – THATCamp DC 2017 http://dc2017.thatcamp.org 32 32 Building History Databases: What’s Overkill? http://dc2017.thatcamp.org/2017/03/27/building-history-databases-whats-overkill-4/ Mon, 27 Mar 2017 19:21:01 +0000 http://dc2017.thatcamp.org/?p=306 Continue reading ]]>

Hosted by: David M- PHD at George Mason

Excel
Pros:
-Helpful for social media analysis

Cons:
-Information overload

Google Sheets, Google Docs, etc.
Pros:
-Easy to share with a broader community
-Able to receive feedback in real-time
-Can access data from anywhere with a wifi connection
-Free, no costly fees of database subscriptions, software expenses etc.
-User friendly

Cons:
-Runs risk of glitches
-Less shortcuts than excel
-Not as advanced as excel when it comes to visual analyzation ie: pivot tables, etc.

Alternative methods:
-Database(s)
-Open Refined Program (refined.org)
-Sequel

Conclusion:
-All comes down to personal preference
-It is possible to use various programs simultaniously for different needs
-Tailor your user experience to match project needs
-Research alternative methods to expedite process
-Share tools + tips with the digital community for enhanced user experience

]]>
Tool Sharing http://dc2017.thatcamp.org/2017/03/25/tool-sharing/ Sat, 25 Mar 2017 19:44:10 +0000 http://dc2017.thatcamp.org/?p=288 Continue reading ]]>

How to approach DH?

-Text analysis
-Social network analysis
-Geo-spatial mapping
-Distance reading / content analysis
-Visual/sound analysis
-Visualization

Resources

Dirt Directory (dirtdirectory.org)
-comprehensive website/registry listing resources to help you conduct research
-can be categorized by your approach (text analysis, numeric data, etc.)

Tags (for twitter date collection)
-allows you to collect any tweet you want by the minute
-only need twitter and gmail account
-using twitter’s API including location, vast amounts of data

Voyant (voyant-tools.org) for text analysis
-load your own dataset
-enables you to quantify the humanities into datasets just as scientists and social scientists do
-shows (from left to right) a word cloud, an automatic summary (including words per sentence, frequent words, distinctive words, vocabulary density, etc.), the top five words, and words preceding and following specific words
-tool to exclude phrases you do not want to count as words

Programming Historian (programminghistorian.org)
-valuable especially for isolated regions where resources may be more limited
-always looking for contributors
-tutorials are well-written
-using regular expression to clean OCR text

Open Refine (openrefine.org)

Text grid labs – downloadable application for text analysis
-upload photos of manuscript
-can embed links, etc.

Gephi (gephi.org) for visualization

Palladio (hdlab.standford.edu/palladio) for visualizing historical data
-perfect for exploring and catered to be user-friendly
-partially funded by NEH

Google nGram

Social network analysis
-lots of statistics
-all you need is two columns of two related persons
-difference from Palladio – shows nodes (persons beyond the first degree of separation)
-analysis includes:
-maximum geodesic distance – diameter (“hops” of degrees of separation from one side of the chart to the other side)
-centrality (how many times people have go through you to get to another relation)
-exemplifies “power law curve”
-Eigenvector unit – “proximity to power” (how close you are to people with high scores of centrality)

Oxygen

Omeka
-omeka.org and omeka.net
-free, easy, nice to use
-really good at presenting all the metadata, making it very accessible
-comprehensive source for manuscript, images, audio, video

Zotero
-good for articles, books, embedding
-create things in zotero and you can embed on Omeka using a connecting tool

]]>
Building History Databases: What’s Overkill? http://dc2017.thatcamp.org/2017/03/25/building-history-databases-whats-overkill-3/ Sat, 25 Mar 2017 19:39:25 +0000 http://dc2017.thatcamp.org/?p=283 Continue reading ]]>

Building History Databases: What’s Overkill?

Transcribed by: Sydney Thatcher

  • US and Mexican travelers 1846 across border
  • More quantitative information then expected
    • Ship manifests- which includes some data such as names of people and where they came from
  • Historians build these data bases for their own use to organize personal data or is it a way to also share the date with other researchers
  • Hard to maintain an open access database with such information
  • There was a guy in the New York times who made a Google sheet and anyone could see
    • Access and Filemaker you merely see a page at a time
    • Excel allows you to see 50 entries at the same time
    • Google sheets is also good so that multiple people can collaborate in one place
    • Quantifying Kissinger- an example of how excel can have visualizations
    • Mapping, visualizations, and social network analysis
  • Gephi- is a plug in for visualization
    • Geolayout- applies latitude and longitude
  • Gene Bower- Theory in DH, the relations you build into a database is how you can get the information out of your database
  • Hard to determine if simple data that can be held in an excel sheet can answer larger questions about the history of the time
  • Create multiple sheets for different information on excel
  • Heuristnetwork.org– possible in between source between an excel sheet and a database
    • Put in your info and how you want them to connect
  • Sequal- to be able to link multiple ships and people together
  • Openrefine.org – allows you to clean up messy data such as in an excel sheet
  • Carto.org
    • Robust mapping tool, which also has a timeline feature
  • Quantifying Kissinger is a good source for ideas of different kinds of visualizations that could be used with substantial excel skills
  • Node excel- add excel sheets and play with them
    • As Diana Cline has used
  • Paledio- is another source for visual constructions
  • Lincoln Mullin- statistical analysis and visual analysis
  • Introtodh2016.web.unc.ed/workshops/mapping
    • Has some examples of mapping sources
  • Neat line
    • Presentation, carto, D3- are more researching tools
  • D3- is a java script system but can allow you to put in research the way you want to see it
]]>