Infodump and Excel
From Mefi Wiki
[edit] Opening infodump files in Excel
Get & unpack the file. I'm using MetaTalk post stats for this, because it's one of the smaller files.
Open the file. It probably won't show the file by default - instead of showing Files of type: All Excel Files, show Text Files instead.
Note that many of the Infodump files have enough rows to overwhelm Excel. See Infodump#File_size for file sizes & spreadsheet limits.
The Text Import Wizard should start:
- Step 1: It's delimited, not Fixed width.
- Step 2: Select only Tab as the delimiter.
- Step 3: Finish.
The datestamp format is a bit off: instead of showing "2000-03-03 18:22:59.870", it shows up as "22:59.9" here. The full datestamp is still there, it's just not being displayed. You can change it to your liking - I like the format "3/14/01 1:30 PM".
[edit] Do Stuff
The first line is the datestamp for the database dump. This can get in the way.
The second line is a header, labelling each field type.
You could now "Select All," go to Data then Sort, tell Excel that there is a header row, & find
- the threads with the most comments - in Metatalk, post 15931 currently has the most, 3649
- the most favorites - post 15606 currently has the most, 270.
Filtering by deleted values of 1 shows only the deleted posts - sorting by userid then scanning shows that userid 191 has the most deleted posts in MetaTalk. (That would be admin pb, deleting test posts.)
[edit] Sorting popular tags
I'm using Excel 2007, which supports more rows. Tagdata_mefi has 366k rows, with multiple rows for posts with multiple tags. I removed the date line, then sorted the tag column alphabetically. Googling "count uniq rows excel" (I was looking for equivalents for the unix command uniq) led me to a way to count tags using a pivot table. I copied the pivot table report & used Paste Special to put the values into a new spreadsheet, which I then sorted by Count of tag_name, & started scanning.
There were 79k unique tags.
For comparison, using Debian Linux tools, here's the command to produce the equivalent list: "sort -f -k 5 tagdata_mefi.txt | uniq -c -i -f 4 | sort -rn | less" That leaves some unnecessary columns.

