
If look at dates, names, can see that it's sometimes fuzzy, messy - must be flattened to fit into precise, specific systems? Image, data.

A shift from reading pages to reading a dataset enables entirely new research questions. Can then offer collections of metadata, of text, of images, for reading individually or mining as a dataset. 'Big data' in cultural heritage What kinds of data are we talking about? At the very least, providing photographs of pages, which can then be transcribed as text.Image: The storage void of the new British Library National Newspaper Building at Boston Spa in West Yorkshire.when your bank texts you re possible fraudulent transaction In some contexts, it is important to analyse data as quickly as possible, even in real time e.g. Big data often involves bringing together data from different sources e.g. It's easier than ever before to make creepy – what's left out of the analysis, and why? – how confident you are in matches, results – how it was transformed, cleaned to fit into – provenance and qualities of original dataset(s) Do they under- or over-report any factors?.Thousands of records that used Great Britain,Įngland, Scotland, Wales, Northern Ireland etc Records that used 'United Kingdom' tens of Opportunity: time to get to know the data These are not the same place (if you're a.Reformatting (unless everything is ready to be.Collecting (unless everything is already.Digitising (unless everything is already.Natural language processing to sort through Microsoft similarly use machine learning and.Trials, and two million pages of text from medical Reports, 1.5 million patient records and clinical 'informed by data from 600,000 medical evidence Sloan-Kettering Cancer Center, suggestions IBM Watson's used by oncologists at Memorial.

Personalised treatment plans for cancer patients Veritas case study, 'Early Case Assessment in Electronic Discovery' Recruitment - shortlisting CVs to job ads.Machine learning, artificial intelligence 'search-and-replace cleaning takes a long time' Residency 'dealing with the sheer size of theĭataset: it's tricky to load 60mb worth of 270,000

My experience at Cooper Hewitt: 20% of my
