contact us
Blog

Converting Latin1 charset tables with UTF8 data set
CommentsMar 15, 2011
Objective
To migrate of TYPO3 records [pages, tt_news and tt_content] to Drupal. The TYPO3 site was multilingual - English, Danish and Greenlandic. The TYPO3 DB had Latin1 charset tables with UTF8 data stored (Are you sure about this? How do you know?) which needed to be converted to UTF8 for a Drupal database.
Initial Approach
Change the DB and table charset to UTF8, which should convert latin1 data to UTF8 with command like
- ALTER TABLE {tablename} MODIFY {table column} CHAR(20) CHARACTER SET utf8
- ALTER TABLE {tablename} DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci
Problem
Which is the correct one?
While fetching the records from phpmyadmin in the browser the text showed some junk characters.

But checking the same record through terminal mysql client displayed the correct results

Final Solution
The data was already in UTF8 and converting the table or columns from Latin1 to UTF8 will display junk characters, while the strange thing was that the MY-SQL client via terminal was displaying it correctly.
After searching through Google we came upon this particular post - http://bit.ly/1RAqTO which gave us the breakthrough. The solution was to convert the fields to BLOB and then BLOB field to UTF8. Following this pattern, we solved out problems. The Drupal site is going to be live at www.knr.gl shortly.
| CHAR | BINARY |
| VARCHAR | VARBINARY |
| TINYTEXT | TINYBLOB |
| TEXT | BLOB |
| MEDIUMTEXT | MEDIUMBLOB |
| LONGTEXT | LONGBLOB |
