Split text file into Excel sheet and separate the final results intoa new sheet

  • Thread starter Luciano Paulino da Silva
  • Start date
L

Luciano Paulino da Silva

Dear All,
I would like to split one text file (example bellow) containing
several packs of data with the head "MW-....." into one excel file
for which each sheet should contain the data bellow the "MW... "
separated into four columns. After that, I will have to count the
number of times that apear each one of the strings present in the
columns (For example, to the sheet MW-
Silks_1_10-071f53284b36f9841994574243ecb063, it appears on column 3
the string coil 4 times.
Finally, the output should be something like:
For column 3:

coil turn helix bend bridge sheet
MW-Silks_1_10-071f53284b36f9841994574243ecb063 4
1 5 0 0 0
MW-Silks_1_10-07ab959b2314cae1f575921c5b0f7bce 6
0 0 6 1 0
MW-Silks_1_10-0c045e18accfb4d3c78af87e0fbda543 4
1 0 3 0 13

File example (simplified since they have until 500 kb of data):

MW-Silks_1_10-071f53284b36f9841994574243ecb063

255 ALA coil

256 ALA coil

257 ALA coil

258 ALA coil

259 GLY turn

260 GLY helix (helix_alpha, helix1)

261 ALA helix (helix_alpha, helix1)

262 GLY helix (helix_alpha, helix1)

263 GLN helix (helix_alpha, helix1)

264 GLY helix (helix_alpha, helix1)



MW-Silks_1_10-07ab959b2314cae1f575921c5b0f7bce

60 ALA coil

61 GLY coil

62 GLN coil

63 GLY coil

64 GLY bend

65 TYR bend

66 GLU coil

67 GLY bend

68 PRO bend

69 GLY bend

70 ALA bend

71 GLY coil

72 GLN bridge



MW-Silks_1_10-0c045e18accfb4d3c78af87e0fbda543

36 GLY sheet (sheet1, strand1_1)

37 GLY sheet (sheet1, strand1_1)

38 ALA sheet (sheet1, strand1_1)

39 GLY sheet (sheet1, strand1_1)

40 GLN sheet (sheet1, strand1_1)

41 GLY sheet (sheet1, strand1_1)

42 GLY sheet (sheet1, strand1_1)

43 TYR sheet (sheet1, strand1_1)

44 GLY sheet (sheet1, strand1_1)

45 GLY coil

46 GLN coil

47 GLY turn

48 ALA bend

49 GLY coil

50 GLN coil

51 GLY bend

52 ALA bend

53 ALA sheet (sheet1, strand1_2)

54 ALA sheet (sheet1, strand1_2)

55 ALA sheet (sheet1, strand1_2)

56 ALA sheet (sheet1, strand1_2)



Somebody could help me?
 
G

Glenn

Luciano said:
Dear All,
I would like to split one text file (example bellow) containing
several packs of data with the head "MW-....." into one excel file
for which each sheet should contain the data bellow the "MW... "
separated into four columns. After that, I will have to count the
number of times that apear each one of the strings present in the
columns (For example, to the sheet MW-
Silks_1_10-071f53284b36f9841994574243ecb063, it appears on column 3
the string coil 4 times.
Finally, the output should be something like:
For column 3:

coil turn helix bend bridge sheet
MW-Silks_1_10-071f53284b36f9841994574243ecb063 4
1 5 0 0 0
MW-Silks_1_10-07ab959b2314cae1f575921c5b0f7bce 6
0 0 6 1 0
MW-Silks_1_10-0c045e18accfb4d3c78af87e0fbda543 4
1 0 3 0 13

File example (simplified since they have until 500 kb of data):

MW-Silks_1_10-071f53284b36f9841994574243ecb063

255 ALA coil

256 ALA coil





Somebody could help me?


Put your data in a blank worksheet starting in A2. Put column headers in C1:G1
as appropriate (I used "Header", "Column1", "Column2", "Column3" and "Column4").
In C2:G2, enter the following formulas and copy down as needed:

C2 = IF(D2="","",IF(C1="",A1,C1))

D2 = IF(ISERROR(FIND(" ",A2)),"",TRIM(LEFT(A2,FIND(" ",A2))))

E2 = TRIM(MID(SUBSTITUTE(SUBSTITUTE($A2," ",REPT(" ",99),2)," ",
REPT(" ",99),1),99,99))

F2 = TRIM(MID(SUBSTITUTE(SUBSTITUTE(A2," ",REPT(" ",99),3)," ",
REPT(" ",99),2),99,99))

G2 = TRIM(MID(SUBSTITUTE(A2," ",REPT(" ",99),3),99,99))

Place your cursor in C2 and select Data / PivotTable and PivotChart Report and
then click Finished.

From the PivotTable Field List drag "Header" to the Row Fields area and drag
"Column3" to the Column Fields area and the Data Items area. In the drop down
for "Column3" in the Column Fields area, uncheck the blank data. You can drag
the individual column headers to get them in the proper order, if that is
necessary (default is alphabetical).
 
L

Luciano Paulino da Silva

Dear Glenn,
I'm having some problemas. Some of your stepes suggestions seems that
they did not work fine with me. Would it be possible that you send me
a sample sheet?
Thanks in advance,
Luciano
 
G

Glenn

Luciano said:
Dear Glenn,
I'm having some problemas. Some of your stepes suggestions seems that
they did not work fine with me. Would it be possible that you send me
a sample sheet?
Thanks in advance,
Luciano


http://www.savefile.com/files/2079043

I had to make some small modifications to the formulas to handle leading spaces
in the data I copied from your original message. Not sure if they are present
in the actual data or not, but if they are it should work now.

C2 = IF(D2="","",IF(C1="",A1,C1))

D2 = IF(ISERROR(FIND(" ",A2)),"",TRIM(LEFT(TRIM(A2),
FIND(" ",TRIM(A2)))))

E2 = TRIM(MID(SUBSTITUTE(SUBSTITUTE(TRIM(A2)," ",
REPT(" ",99),2)," ",REPT(" ",99),1),99,99))

F2 = TRIM(MID(SUBSTITUTE(SUBSTITUTE(TRIM(A2)," ",
REPT(" ",99),3)," ",REPT(" ",99),2),99,99))

G2 = TRIM(MID(SUBSTITUTE(TRIM(A2)," ",REPT(" ",99),3),99,99))
 
L

Luciano Paulino da Silva

Dear Glenn,
Thank you very much for your file.
The spaces are inserted. However, my data have hundreds of headers and
consequently data, but I can not understand how could I put them to
the PivotTable. The number of headers is variable for each situation.
Thanks in advance,
Luciano
 
G

Glenn

There were no limits to the solution I provided, unless your data is
significantly different than your example. Save a copy of your actual data to
www.savefile.com and I'll look at it.
 
G

Glenn

That file has over 870,000 lines of data in it, if I'm reading it correctly.
I'm using Excel 2003, which only has 65,536 rows, so I can't even process it all
at once. A smaller sample would have been sufficient.

Having said that, if I open your document in Word and copy a large chunk of data
into my spreadsheet, it works exactly as expected.

http://www.savefile.com/files/2079526
 
L

Luciano Paulino da Silva

Dear Glenn,
I think that it will be necessary that I find another strategy since
my text files are as large as that one I sent you.
Despite I use Excel 2007, I think that it will not be possible perfom
this operation for all cases.
Thank you very much for all your help.
Luciano
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top